grace-gpu-containers

biondizzle/grace-gpu-containers

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cuda-malloc-managed

custom-weights

main

v0.19.0

0698298d13 Bleeding edge: vLLM main branch, flashinfer v0.6.7, Gitea fork source main biondizzle 2026-04-28 10:17:50 +00:00
6e03b5d357 custom weights tweaks biondizzle 2026-04-28 08:59:53 +00:00
10c71a446c Remove flash-attn GIT_TAG override to main — causes FLASHATTENTION_FP8_TWO_LEVEL_INTERVAL undefined error custom-weights biondizzle 2026-04-28 03:07:14 +00:00
550a04a0ca custom weights biondizzle 2026-04-28 02:10:48 +00:00
e43c8c97f1 custom weights biondizzle 2026-04-28 02:08:00 +00:00
be4198e754 Add CMM_BUILD_DATE cache-bust arg to Dockerfile cuda-malloc-managed biondizzle 2026-04-11 23:19:48 +00:00
bcc872c2c3 Remove global allocator swap, use targeted KV cache managed allocation biondizzle 2026-04-11 02:15:09 +00:00
07468031db Sync managed_alloc.cu: selective prefetch (<2 GiB to GPU) biondizzle 2026-04-10 18:37:11 +00:00
cdfd37c1e6 Fix Dockerfile: separate git clone and build RUN commands biondizzle 2026-04-10 15:32:16 +00:00
c1b013234e Fix cache-bust: embed VLLM_COMMIT in git clone RUN command biondizzle 2026-04-10 15:01:29 +00:00
98b4ae6676 Add VLLM_COMMIT cache-bust arg to Dockerfile biondizzle 2026-04-10 06:01:20 +00:00
c583bcb4fc Fix cudaMemPrefetchAsync for CUDA 13: use cudaMemLocation + flags=0 (no stream param) biondizzle 2026-04-10 02:45:05 +00:00
6053e6d0ea Fix cudaMemPrefetchAsync: use int device instead of cudaMemLocation struct biondizzle 2026-04-10 01:48:01 +00:00
aadde3ddf9 CMM: Fix OOM and subprocess crashes for GH200 EGM biondizzle 2026-04-09 23:25:48 +00:00
079eb88d7d Switch vLLM source to Gitea fork (cmm branch) biondizzle 2026-04-09 22:05:40 +00:00
7c79fb4ee7 fix: Update cudaMemAdvise for CUDA 13 API biondizzle 2026-04-07 21:32:17 +00:00
2757bffcb6 Add cudaMallocManaged allocator for GH200 EGM support biondizzle 2026-04-07 21:19:39 +00:00
edf12f7996 Clean up: remove PLAN-triton-kernels.md (merged into main) biondizzle 2026-04-06 17:25:06 +00:00
e6cc28a942 Add triton_kernels for MoE support (vLLM v0.19.0) biondizzle 2026-04-06 16:39:56 +00:00
643d5589a3 Switch flashinfer to v0.6.6 for vLLM v0.19.0 (v0.6.7 works with v0.18.2rc0) v0.19.0 biondizzle 2026-04-03 13:15:56 +00:00
3290adb0ac Upgrade vLLM to v0.19.0 for Gemma 4 support (requires transformers>=5.5.0) biondizzle 2026-04-03 11:55:16 +00:00
cd5d58a6f9 Patch vLLM torch_utils.py: remove hoist=True for NGC PyTorch 2.11 compatibility biondizzle 2026-04-03 11:40:51 +00:00
659c79638c ✅ WORKING BUILD #43 - GH200 vLLM container builds successfully biondizzle 2026-04-03 11:08:29 +00:00
2442906d95 Add -y flag to pip uninstall pynvml for non-interactive Docker build biondizzle 2026-04-03 10:57:42 +00:00
5280a28205 Bump flashinfer from v0.6.6 to v0.6.7 (required by vLLM v0.18.2rc0) biondizzle 2026-04-03 10:52:19 +00:00
dbca81bba2 Switch vLLM from main to v0.18.2rc0 for CUDA 13.2 compatibility biondizzle 2026-04-03 09:19:01 +00:00
202b9c4e23 Add -y flag to pip uninstall infinistore for non-interactive Docker build biondizzle 2026-04-03 09:06:42 +00:00
c2cebcf962 Add apache-tvm-ffi dependency for flashinfer build biondizzle 2026-04-03 09:00:18 +00:00
beb26d3573 Fix python -m build flag: use --no-isolation instead of --no-build-isolation biondizzle 2026-04-03 08:54:45 +00:00
4e8a765c72 Fix wheel install conflict, use python -m build instead of pip build biondizzle 2026-04-03 08:52:43 +00:00
ce55e45db2 Fix NGC PyTorch image tag format (26.03-py3) biondizzle 2026-04-03 08:46:43 +00:00
c92c4ec68a Switch to NVIDIA NGC PyTorch 26.03 base image (PyTorch 2.11.0a0, CUDA 13.2.0, ARM SBSA support) biondizzle 2026-04-03 08:44:36 +00:00
54e609b2c5 Update lmcache/Dockerfile to CUDA 13.0.1, PyTorch nightly, LMCache dev branch biondizzle 2026-04-03 08:39:35 +00:00
4980d9e49a Use PyTorch nightly with CUDA 13.0 (torch 2.11.0.dev) biondizzle 2026-04-03 08:36:36 +00:00
6a97539682 Fix duplicate corrupted lines in Dockerfile biondizzle 2026-04-03 08:31:56 +00:00
f55789c53b Bump to CUDA 13.0.1 + PyTorch 2.9.0, add version output on git checkouts biondizzle 2026-04-03 08:26:53 +00:00
e514e0cd1e Revert my patches - try v0.18.2rc0 biondizzle 2026-04-03 08:09:05 +00:00
4860bcee41 Skip LMCache CUDA extensions (NO_CUDA_EXT=1) biondizzle 2026-04-03 08:05:44 +00:00
360b0dea58 Restore CUDA 13.0.1 + patch vLLM for cuMemcpyBatchAsync API change biondizzle 2026-04-03 07:53:12 +00:00
6255c94359 Downgrade to CUDA 12.8.1 for vLLM compatibility biondizzle 2026-04-03 07:43:19 +00:00
ceab7ada22 Update flashinfer to v0.6.6 to match vLLM 0.18.x requirements biondizzle 2026-04-03 07:13:16 +00:00
9d88d4c7d8 Skip xformers - vLLM has built-in FlashAttention kernels biondizzle 2026-04-03 05:50:02 +00:00
45b6109ee1 Fix xformers TORCH_STABLE_ONLY issue + ramp up MAX_JOBS for native GH200 biondizzle 2026-04-03 05:46:11 +00:00
b223c051de move things biondizzle 2026-04-03 04:27:21 +00:00
7f7ca4a742 move things biondizzle 2026-04-03 04:26:52 +00:00
2dc2008475 move things biondizzle 2026-04-03 04:26:26 +00:00
980cd1b749 move things biondizzle 2026-04-03 04:26:08 +00:00
1540b0c54e move things biondizzle 2026-04-03 04:25:23 +00:00
0b4ede8047 Add .gitignore for internal docs biondizzle 2026-04-03 03:49:41 +00:00
5c29d2bea7 Fix: LMCache default branch is 'dev' not 'main' biondizzle 2026-04-03 03:34:58 +00:00
9259555802 Fix: Actually update LMCache to main branch (previous edit failed) biondizzle 2026-04-03 03:16:45 +00:00
750906e649 Bleeding edge build: LMCache main, vLLM main, latest transformers biondizzle 2026-04-03 03:14:01 +00:00
a399fbc8c6 Add MAX_JOBS=2 for LMCache, restore vLLM build from source biondizzle 2026-04-03 02:49:43 +00:00
f8a9d372e5 Use PyPI vLLM wheel instead of building (QEMU cmake try_compile fails) biondizzle 2026-04-03 00:05:56 +00:00
436214bb72 Use PyPI triton wheel instead of building (QEMU segfaults) biondizzle 2026-04-02 23:58:20 +00:00
e5445512aa Reduce MAX_JOBS by half to reduce QEMU memory pressure biondizzle 2026-04-02 23:44:11 +00:00
4f94431af6 Revert CC/CXX to full paths, keep QEMU_CPU=max biondizzle 2026-04-02 22:50:23 +00:00
866c9d9db8 Add QEMU_CPU=max for better emulation compatibility during cross-compilation biondizzle 2026-04-02 22:47:53 +00:00
2ed1b1e2dd Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility biondizzle 2026-04-02 22:47:22 +00:00
14467bef70 Fix: add --no-build-isolation to pip wheel for flash-attention biondizzle 2026-04-02 20:55:32 +00:00
82b2ceacd5 Update build history and fix pip command docs biondizzle 2026-04-02 20:24:26 +00:00
8f870921f8 Fix: use 'pip wheel' instead of 'uv pip wheel' (uv has no wheel subcommand) biondizzle 2026-04-02 20:22:11 +00:00
9da93ec625 Fix setuptools pin and flash-attention build for GH200 biondizzle 2026-04-02 20:19:39 +00:00
5fa395825a Updated to vLLM v0.11.1rc3 Rajesh Shashi Kumar 2025-10-23 18:16:57 +00:00
0814f059f5 Updated to v0.11.1rc3 Rajesh Shashi Kumar 2025-10-23 18:11:41 +00:00
3c4796ed55 Updated for CUDA 13 Rajesh Shashi Kumar 2025-10-21 19:21:13 +00:00
ebcdb4ab50 Updates for PyTorch 2.9, CUDA13 Rajesh Shashi Kumar 2025-10-20 20:16:06 +00:00
02430037ea Updated for v0.11.0 Rajesh Shashi Kumar 2025-10-16 01:08:21 +00:00
31f4489d1f Update README.md Rajesh Shashi Kumar 2025-09-24 01:43:49 -05:00
201bbf5379 v0.10.2 cleanup Rajesh Shashi Kumar 2025-09-24 06:14:16 +00:00
fc321295f1 Updated for vllm v0.10.2 Rajesh Shashi Kumar 2025-09-24 05:52:11 +00:00
daf345024b Updated for v0.10.0 Rajesh Shashi Kumar 2025-08-20 21:02:46 +00:00
23267e4bf5 v0.9.1+ vLLM with FlashInfer Rajesh Shashi Kumar 2025-06-25 20:03:20 +00:00
64ab367973 v0.9.1 Rajesh Shashi Kumar 2025-06-24 23:33:46 +00:00
3d7f1ed454 vllm 0.9.0.1 Rajesh Shashi Kumar 2025-06-18 21:49:59 +00:00
713775c491 Updates for vllm 0.9.0.1 Rajesh Shashi Kumar 2025-06-04 15:28:22 +00:00
c36ff9ee0e Updated Rajesh Shashi Kumar 2025-06-04 04:47:47 +00:00
3d115911aa 0.9.0.1 Rajesh Shashi Kumar 2025-06-04 03:22:03 +00:00
3ea7d34e83 Merge branch 'main' of https://github.com/rajesh-s/containers Rajesh Shashi Kumar 2025-06-03 22:34:34 +00:00
d30802ef41 Updated for vllm 0.9.0.1 Rajesh Shashi Kumar 2025-06-03 22:34:21 +00:00
b4ae9077ae Create native_build.sh Rajesh Shashi Kumar 2025-05-29 14:34:50 -05:00
87c6773c8f v0.8.4 Rajesh Shashi Kumar 2025-05-27 20:34:14 +00:00
e205f17e2e Added nsys Rajesh Shashi Kumar 2025-04-17 18:46:25 +00:00
256272732d Fixed numpy version Rajesh Shashi Kumar 2025-04-07 18:38:30 +00:00
4d0dc5d06f numpy version fix Rajesh Shashi Kumar 2025-04-03 20:35:56 +00:00
75e33490bd Working version with vLLM+LMCache Rajesh Shashi Kumar 2025-04-01 23:34:16 +00:00
c63afb3d35 Working version with vLLM+LMCache Rajesh Shashi Kumar 2025-04-01 23:33:43 +00:00
57ceca8b4f vllm docker 0.8.1 with lmcache Ubuntu 2025-04-01 20:44:21 +00:00
9f2769285a Initial commit Rajesh Shashi Kumar 2025-03-28 14:31:17 -05:00

Commit Graph Select branches Hide Pull Requests cuda-malloc-managed custom-weights main v0.19.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

cuda-malloc-managed

custom-weights

main

v0.19.0