Commit Graph

34 Commits

Author SHA1 Message Date
e5445512aa Reduce MAX_JOBS by half to reduce QEMU memory pressure
- xformers: 6 -> 3
- flash-attention: 8 -> 4
- vllm: 8 -> 4

Testing if lower parallelism helps avoid segfaults under emulation
2026-04-02 23:44:11 +00:00
4f94431af6 Revert CC/CXX to full paths, keep QEMU_CPU=max 2026-04-02 22:50:23 +00:00
866c9d9db8 Add QEMU_CPU=max for better emulation compatibility during cross-compilation 2026-04-02 22:47:53 +00:00
2ed1b1e2dd Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility 2026-04-02 22:47:22 +00:00
14467bef70 Fix: add --no-build-isolation to pip wheel for flash-attention
Without this flag, pip runs the build in an isolated environment
that doesn't have access to torch in the venv.
2026-04-02 20:55:32 +00:00
82b2ceacd5 Update build history and fix pip command docs 2026-04-02 20:24:26 +00:00
8f870921f8 Fix: use 'pip wheel' instead of 'uv pip wheel' (uv has no wheel subcommand) 2026-04-02 20:22:11 +00:00
9da93ec625 Fix setuptools pin and flash-attention build for GH200
- Pin setuptools>=77.0.3,<81.0.0 for LMCache compatibility
- Use 'uv pip wheel' instead of 'pip3 wheel' for flash-attention (torch is in venv)
- Add CLAWMINE.md with build pipeline documentation
2026-04-02 20:19:39 +00:00
Rajesh Shashi Kumar
5fa395825a Updated to vLLM v0.11.1rc3 2025-10-23 18:16:57 +00:00
Rajesh Shashi Kumar
0814f059f5 Updated to v0.11.1rc3 2025-10-23 18:11:41 +00:00
Rajesh Shashi Kumar
3c4796ed55 Updated for CUDA 13 2025-10-21 19:21:13 +00:00
Rajesh Shashi Kumar
ebcdb4ab50 Updates for PyTorch 2.9, CUDA13 2025-10-20 20:16:06 +00:00
Rajesh Shashi Kumar
02430037ea Updated for v0.11.0 2025-10-16 01:08:21 +00:00
Rajesh Shashi Kumar
31f4489d1f Update README.md 2025-09-24 01:43:49 -05:00
Rajesh Shashi Kumar
201bbf5379 v0.10.2 cleanup 2025-09-24 06:14:16 +00:00
Rajesh Shashi Kumar
fc321295f1 Updated for vllm v0.10.2 2025-09-24 05:52:11 +00:00
Rajesh Shashi Kumar
daf345024b Updated for v0.10.0 2025-08-20 21:02:46 +00:00
Rajesh Shashi Kumar
23267e4bf5 v0.9.1+ vLLM with FlashInfer 2025-06-25 20:03:20 +00:00
Rajesh Shashi Kumar
64ab367973 v0.9.1 2025-06-24 23:33:46 +00:00
Rajesh Shashi Kumar
3d7f1ed454 vllm 0.9.0.1 2025-06-18 21:49:59 +00:00
Rajesh Shashi Kumar
713775c491 Updates for vllm 0.9.0.1 2025-06-04 15:28:22 +00:00
Rajesh Shashi Kumar
c36ff9ee0e Updated 2025-06-04 04:47:47 +00:00
Rajesh Shashi Kumar
3d115911aa 0.9.0.1 2025-06-04 03:22:03 +00:00
Rajesh Shashi Kumar
3ea7d34e83 Merge branch 'main' of https://github.com/rajesh-s/containers 2025-06-03 22:34:34 +00:00
Rajesh Shashi Kumar
d30802ef41 Updated for vllm 0.9.0.1 2025-06-03 22:34:21 +00:00
Rajesh Shashi Kumar
b4ae9077ae Create native_build.sh 2025-05-29 14:34:50 -05:00
Rajesh Shashi Kumar
87c6773c8f v0.8.4 2025-05-27 20:34:14 +00:00
Rajesh Shashi Kumar
e205f17e2e Added nsys 2025-04-17 18:46:25 +00:00
Rajesh Shashi Kumar
256272732d Fixed numpy version 2025-04-07 18:38:30 +00:00
Rajesh Shashi Kumar
4d0dc5d06f numpy version fix 2025-04-03 20:35:56 +00:00
Rajesh Shashi Kumar
75e33490bd Working version with vLLM+LMCache 2025-04-01 23:34:16 +00:00
Rajesh Shashi Kumar
c63afb3d35 Working version with vLLM+LMCache 2025-04-01 23:33:43 +00:00
Ubuntu
57ceca8b4f vllm docker 0.8.1 with lmcache 2025-04-01 20:44:21 +00:00
Rajesh Shashi Kumar
9f2769285a Initial commit 2025-03-28 14:31:17 -05:00