grace-gpu-containers

Author	SHA1	Message	Date
biondizzle	9d88d4c7d8	Skip xformers - vLLM has built-in FlashAttention kernels xformers requires TORCH_STABLE_ONLY which needs torch/csrc/stable/ headers not present in PyTorch 2.9.0. vLLM 0.18.1 includes its own FA2/FA3 kernels.	2026-04-03 05:50:02 +00:00
biondizzle	45b6109ee1	Fix xformers TORCH_STABLE_ONLY issue + ramp up MAX_JOBS for native GH200 - Switch to official facebook/xformers (johnnynunez fork has TORCH_STABLE_ONLY requiring PyTorch headers not in 2.9.0) - Increase MAX_JOBS from 2-4 to 8 for all builds (native GH200 has 97GB HBM3) - Increase NVCC_THREADS from 1 to 4 for flash-attention	2026-04-03 05:46:11 +00:00
biondizzle	b223c051de	move things	2026-04-03 04:27:21 +00:00
biondizzle	7f7ca4a742	move things	2026-04-03 04:26:52 +00:00
biondizzle	2dc2008475	move things	2026-04-03 04:26:26 +00:00
biondizzle	980cd1b749	move things	2026-04-03 04:26:08 +00:00
biondizzle	1540b0c54e	move things	2026-04-03 04:25:23 +00:00
biondizzle	0b4ede8047	Add .gitignore for internal docs	2026-04-03 03:49:41 +00:00
biondizzle	5c29d2bea7	Fix: LMCache default branch is 'dev' not 'main'	2026-04-03 03:34:58 +00:00
biondizzle	9259555802	Fix: Actually update LMCache to main branch (previous edit failed)	2026-04-03 03:16:45 +00:00
biondizzle	750906e649	Bleeding edge build: LMCache main, vLLM main, latest transformers	2026-04-03 03:14:01 +00:00
biondizzle	a399fbc8c6	Add MAX_JOBS=2 for LMCache, restore vLLM build from source - LMCache: reduced parallelism to avoid memory pressure - vLLM: restored build from source (was using PyPI wheel) - Will test with docker --memory=24g limit	2026-04-03 02:49:43 +00:00
biondizzle	f8a9d372e5	Use PyPI vLLM wheel instead of building (QEMU cmake try_compile fails) - vLLM 0.18.1 aarch64 wheel includes pre-compiled FA2, FA3, MoE kernels - Original build-from-source code commented out for GH200 restoration - CMake compiler ABI detection fails under QEMU emulation	2026-04-03 00:05:56 +00:00
biondizzle	436214bb72	Use PyPI triton wheel instead of building (QEMU segfaults) Triton 3.6.0 has official aarch64 wheel on PyPI. Building triton from source causes segfaults under QEMU emulation.	2026-04-02 23:58:20 +00:00
biondizzle	e5445512aa	Reduce MAX_JOBS by half to reduce QEMU memory pressure - xformers: 6 -> 3 - flash-attention: 8 -> 4 - vllm: 8 -> 4 Testing if lower parallelism helps avoid segfaults under emulation	2026-04-02 23:44:11 +00:00
biondizzle	4f94431af6	Revert CC/CXX to full paths, keep QEMU_CPU=max	2026-04-02 22:50:23 +00:00
biondizzle	866c9d9db8	Add QEMU_CPU=max for better emulation compatibility during cross-compilation	2026-04-02 22:47:53 +00:00
biondizzle	2ed1b1e2dd	Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility	2026-04-02 22:47:22 +00:00
biondizzle	14467bef70	Fix: add --no-build-isolation to pip wheel for flash-attention Without this flag, pip runs the build in an isolated environment that doesn't have access to torch in the venv.	2026-04-02 20:55:32 +00:00
biondizzle	82b2ceacd5	Update build history and fix pip command docs	2026-04-02 20:24:26 +00:00
biondizzle	8f870921f8	Fix: use 'pip wheel' instead of 'uv pip wheel' (uv has no wheel subcommand)	2026-04-02 20:22:11 +00:00
biondizzle	9da93ec625	Fix setuptools pin and flash-attention build for GH200 - Pin setuptools>=77.0.3,<81.0.0 for LMCache compatibility - Use 'uv pip wheel' instead of 'pip3 wheel' for flash-attention (torch is in venv) - Add CLAWMINE.md with build pipeline documentation	2026-04-02 20:19:39 +00:00
Rajesh Shashi Kumar	5fa395825a	Updated to vLLM v0.11.1rc3	2025-10-23 18:16:57 +00:00
Rajesh Shashi Kumar	0814f059f5	Updated to v0.11.1rc3	2025-10-23 18:11:41 +00:00
Rajesh Shashi Kumar	3c4796ed55	Updated for CUDA 13	2025-10-21 19:21:13 +00:00
Rajesh Shashi Kumar	ebcdb4ab50	Updates for PyTorch 2.9, CUDA13	2025-10-20 20:16:06 +00:00
Rajesh Shashi Kumar	02430037ea	Updated for v0.11.0	2025-10-16 01:08:21 +00:00
Rajesh Shashi Kumar	31f4489d1f	Update README.md	2025-09-24 01:43:49 -05:00
Rajesh Shashi Kumar	201bbf5379	v0.10.2 cleanup	2025-09-24 06:14:16 +00:00
Rajesh Shashi Kumar	fc321295f1	Updated for vllm v0.10.2	2025-09-24 05:52:11 +00:00
Rajesh Shashi Kumar	daf345024b	Updated for v0.10.0	2025-08-20 21:02:46 +00:00
Rajesh Shashi Kumar	23267e4bf5	v0.9.1+ vLLM with FlashInfer	2025-06-25 20:03:20 +00:00
Rajesh Shashi Kumar	64ab367973	v0.9.1	2025-06-24 23:33:46 +00:00
Rajesh Shashi Kumar	3d7f1ed454	vllm 0.9.0.1	2025-06-18 21:49:59 +00:00
Rajesh Shashi Kumar	713775c491	Updates for vllm 0.9.0.1	2025-06-04 15:28:22 +00:00
Rajesh Shashi Kumar	c36ff9ee0e	Updated	2025-06-04 04:47:47 +00:00
Rajesh Shashi Kumar	3d115911aa	0.9.0.1	2025-06-04 03:22:03 +00:00
Rajesh Shashi Kumar	3ea7d34e83	Merge branch 'main' of https://github.com/rajesh-s/containers	2025-06-03 22:34:34 +00:00
Rajesh Shashi Kumar	d30802ef41	Updated for vllm 0.9.0.1	2025-06-03 22:34:21 +00:00
Rajesh Shashi Kumar	b4ae9077ae	Create native_build.sh	2025-05-29 14:34:50 -05:00
Rajesh Shashi Kumar	87c6773c8f	v0.8.4	2025-05-27 20:34:14 +00:00
Rajesh Shashi Kumar	e205f17e2e	Added nsys	2025-04-17 18:46:25 +00:00
Rajesh Shashi Kumar	256272732d	Fixed numpy version	2025-04-07 18:38:30 +00:00
Rajesh Shashi Kumar	4d0dc5d06f	numpy version fix	2025-04-03 20:35:56 +00:00
Rajesh Shashi Kumar	75e33490bd	Working version with vLLM+LMCache	2025-04-01 23:34:16 +00:00
Rajesh Shashi Kumar	c63afb3d35	Working version with vLLM+LMCache	2025-04-01 23:33:43 +00:00
Ubuntu	57ceca8b4f	vllm docker 0.8.1 with lmcache	2025-04-01 20:44:21 +00:00
Rajesh Shashi Kumar	9f2769285a	Initial commit	2025-03-28 14:31:17 -05:00

48 Commits