4e8a765c72
Fix wheel install conflict, use python -m build instead of pip build
2026-04-03 08:52:43 +00:00
ce55e45db2
Fix NGC PyTorch image tag format (26.03-py3)
2026-04-03 08:46:43 +00:00
c92c4ec68a
Switch to NVIDIA NGC PyTorch 26.03 base image (PyTorch 2.11.0a0, CUDA 13.2.0, ARM SBSA support)
2026-04-03 08:44:36 +00:00
4980d9e49a
Use PyTorch nightly with CUDA 13.0 (torch 2.11.0.dev)
2026-04-03 08:36:36 +00:00
6a97539682
Fix duplicate corrupted lines in Dockerfile
2026-04-03 08:31:56 +00:00
f55789c53b
Bump to CUDA 13.0.1 + PyTorch 2.9.0, add version output on git checkouts
2026-04-03 08:26:53 +00:00
e514e0cd1e
Revert my patches - try v0.18.2rc0
2026-04-03 08:09:05 +00:00
4860bcee41
Skip LMCache CUDA extensions (NO_CUDA_EXT=1)
...
PyTorch 2.9.0+cu130 was compiled with CUDA 12.8 but container has CUDA 13.0.
Skip CUDA extension build to avoid version mismatch.
2026-04-03 08:05:44 +00:00
360b0dea58
Restore CUDA 13.0.1 + patch vLLM for cuMemcpyBatchAsync API change
...
CUDA 13 removed the fail_idx parameter from cuMemcpyBatchAsync.
Patch cache_kernels.cu to match new API signature instead of downgrading.
- Restore CUDA 13.0.1, PyTorch 2.9.0+cu130, flashinfer cu130
- Patch: remove fail_idx variable and parameter from cuMemcpyBatchAsync call
- Simplify error message to not reference fail_idx
2026-04-03 07:53:12 +00:00
6255c94359
Downgrade to CUDA 12.8.1 for vLLM compatibility
...
cuMemcpyBatchAsync API changed in CUDA 13 - removed fail_idx parameter.
vLLM code targets CUDA 12.8 API. Downgrade to CUDA 12.8.1.
2026-04-03 07:43:19 +00:00
ceab7ada22
Update flashinfer to v0.6.6 to match vLLM 0.18.x requirements
...
vLLM 0.18.x depends on flashinfer-python==0.6.6, was building 0.4.1
2026-04-03 07:13:16 +00:00
9d88d4c7d8
Skip xformers - vLLM has built-in FlashAttention kernels
...
xformers requires TORCH_STABLE_ONLY which needs torch/csrc/stable/ headers
not present in PyTorch 2.9.0. vLLM 0.18.1 includes its own FA2/FA3 kernels.
2026-04-03 05:50:02 +00:00
45b6109ee1
Fix xformers TORCH_STABLE_ONLY issue + ramp up MAX_JOBS for native GH200
...
- Switch to official facebook/xformers (johnnynunez fork has TORCH_STABLE_ONLY requiring PyTorch headers not in 2.9.0)
- Increase MAX_JOBS from 2-4 to 8 for all builds (native GH200 has 97GB HBM3)
- Increase NVCC_THREADS from 1 to 4 for flash-attention
2026-04-03 05:46:11 +00:00
5c29d2bea7
Fix: LMCache default branch is 'dev' not 'main'
2026-04-03 03:34:58 +00:00
9259555802
Fix: Actually update LMCache to main branch (previous edit failed)
2026-04-03 03:16:45 +00:00
750906e649
Bleeding edge build: LMCache main, vLLM main, latest transformers
2026-04-03 03:14:01 +00:00
a399fbc8c6
Add MAX_JOBS=2 for LMCache, restore vLLM build from source
...
- LMCache: reduced parallelism to avoid memory pressure
- vLLM: restored build from source (was using PyPI wheel)
- Will test with docker --memory=24g limit
2026-04-03 02:49:43 +00:00
f8a9d372e5
Use PyPI vLLM wheel instead of building (QEMU cmake try_compile fails)
...
- vLLM 0.18.1 aarch64 wheel includes pre-compiled FA2, FA3, MoE kernels
- Original build-from-source code commented out for GH200 restoration
- CMake compiler ABI detection fails under QEMU emulation
2026-04-03 00:05:56 +00:00
436214bb72
Use PyPI triton wheel instead of building (QEMU segfaults)
...
Triton 3.6.0 has official aarch64 wheel on PyPI.
Building triton from source causes segfaults under QEMU emulation.
2026-04-02 23:58:20 +00:00
e5445512aa
Reduce MAX_JOBS by half to reduce QEMU memory pressure
...
- xformers: 6 -> 3
- flash-attention: 8 -> 4
- vllm: 8 -> 4
Testing if lower parallelism helps avoid segfaults under emulation
2026-04-02 23:44:11 +00:00
4f94431af6
Revert CC/CXX to full paths, keep QEMU_CPU=max
2026-04-02 22:50:23 +00:00
866c9d9db8
Add QEMU_CPU=max for better emulation compatibility during cross-compilation
2026-04-02 22:47:53 +00:00
2ed1b1e2dd
Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility
2026-04-02 22:47:22 +00:00
14467bef70
Fix: add --no-build-isolation to pip wheel for flash-attention
...
Without this flag, pip runs the build in an isolated environment
that doesn't have access to torch in the venv.
2026-04-02 20:55:32 +00:00
8f870921f8
Fix: use 'pip wheel' instead of 'uv pip wheel' (uv has no wheel subcommand)
2026-04-02 20:22:11 +00:00
9da93ec625
Fix setuptools pin and flash-attention build for GH200
...
- Pin setuptools>=77.0.3,<81.0.0 for LMCache compatibility
- Use 'uv pip wheel' instead of 'pip3 wheel' for flash-attention (torch is in venv)
- Add CLAWMINE.md with build pipeline documentation
2026-04-02 20:19:39 +00:00
Rajesh Shashi Kumar
0814f059f5
Updated to v0.11.1rc3
2025-10-23 18:11:41 +00:00
Rajesh Shashi Kumar
3c4796ed55
Updated for CUDA 13
2025-10-21 19:21:13 +00:00
Rajesh Shashi Kumar
ebcdb4ab50
Updates for PyTorch 2.9, CUDA13
2025-10-20 20:16:06 +00:00
Rajesh Shashi Kumar
02430037ea
Updated for v0.11.0
2025-10-16 01:08:21 +00:00
Rajesh Shashi Kumar
201bbf5379
v0.10.2 cleanup
2025-09-24 06:14:16 +00:00
Rajesh Shashi Kumar
fc321295f1
Updated for vllm v0.10.2
2025-09-24 05:52:11 +00:00
Rajesh Shashi Kumar
daf345024b
Updated for v0.10.0
2025-08-20 21:02:46 +00:00
Rajesh Shashi Kumar
23267e4bf5
v0.9.1+ vLLM with FlashInfer
2025-06-25 20:03:20 +00:00
Rajesh Shashi Kumar
64ab367973
v0.9.1
2025-06-24 23:33:46 +00:00
Rajesh Shashi Kumar
3d7f1ed454
vllm 0.9.0.1
2025-06-18 21:49:59 +00:00
Rajesh Shashi Kumar
713775c491
Updates for vllm 0.9.0.1
2025-06-04 15:28:22 +00:00
Rajesh Shashi Kumar
c36ff9ee0e
Updated
2025-06-04 04:47:47 +00:00
Rajesh Shashi Kumar
3d115911aa
0.9.0.1
2025-06-04 03:22:03 +00:00
Rajesh Shashi Kumar
d30802ef41
Updated for vllm 0.9.0.1
2025-06-03 22:34:21 +00:00
Rajesh Shashi Kumar
87c6773c8f
v0.8.4
2025-05-27 20:34:14 +00:00
Rajesh Shashi Kumar
e205f17e2e
Added nsys
2025-04-17 18:46:25 +00:00
Rajesh Shashi Kumar
256272732d
Fixed numpy version
2025-04-07 18:38:30 +00:00
Rajesh Shashi Kumar
4d0dc5d06f
numpy version fix
2025-04-03 20:35:56 +00:00
Rajesh Shashi Kumar
75e33490bd
Working version with vLLM+LMCache
2025-04-01 23:34:16 +00:00
Rajesh Shashi Kumar
c63afb3d35
Working version with vLLM+LMCache
2025-04-01 23:33:43 +00:00
Ubuntu
57ceca8b4f
vllm docker 0.8.1 with lmcache
2025-04-01 20:44:21 +00:00