Commit Graph

35 Commits

Author SHA1 Message Date
436214bb72 Use PyPI triton wheel instead of building (QEMU segfaults)
Triton 3.6.0 has official aarch64 wheel on PyPI.
Building triton from source causes segfaults under QEMU emulation.
2026-04-02 23:58:20 +00:00
e5445512aa Reduce MAX_JOBS by half to reduce QEMU memory pressure
- xformers: 6 -> 3
- flash-attention: 8 -> 4
- vllm: 8 -> 4

Testing if lower parallelism helps avoid segfaults under emulation
2026-04-02 23:44:11 +00:00
4f94431af6 Revert CC/CXX to full paths, keep QEMU_CPU=max 2026-04-02 22:50:23 +00:00
866c9d9db8 Add QEMU_CPU=max for better emulation compatibility during cross-compilation 2026-04-02 22:47:53 +00:00
2ed1b1e2dd Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility 2026-04-02 22:47:22 +00:00
14467bef70 Fix: add --no-build-isolation to pip wheel for flash-attention
Without this flag, pip runs the build in an isolated environment
that doesn't have access to torch in the venv.
2026-04-02 20:55:32 +00:00
82b2ceacd5 Update build history and fix pip command docs 2026-04-02 20:24:26 +00:00
8f870921f8 Fix: use 'pip wheel' instead of 'uv pip wheel' (uv has no wheel subcommand) 2026-04-02 20:22:11 +00:00
9da93ec625 Fix setuptools pin and flash-attention build for GH200
- Pin setuptools>=77.0.3,<81.0.0 for LMCache compatibility
- Use 'uv pip wheel' instead of 'pip3 wheel' for flash-attention (torch is in venv)
- Add CLAWMINE.md with build pipeline documentation
2026-04-02 20:19:39 +00:00
Rajesh Shashi Kumar
5fa395825a Updated to vLLM v0.11.1rc3 2025-10-23 18:16:57 +00:00
Rajesh Shashi Kumar
0814f059f5 Updated to v0.11.1rc3 2025-10-23 18:11:41 +00:00
Rajesh Shashi Kumar
3c4796ed55 Updated for CUDA 13 2025-10-21 19:21:13 +00:00
Rajesh Shashi Kumar
ebcdb4ab50 Updates for PyTorch 2.9, CUDA13 2025-10-20 20:16:06 +00:00
Rajesh Shashi Kumar
02430037ea Updated for v0.11.0 2025-10-16 01:08:21 +00:00
Rajesh Shashi Kumar
31f4489d1f Update README.md 2025-09-24 01:43:49 -05:00
Rajesh Shashi Kumar
201bbf5379 v0.10.2 cleanup 2025-09-24 06:14:16 +00:00
Rajesh Shashi Kumar
fc321295f1 Updated for vllm v0.10.2 2025-09-24 05:52:11 +00:00
Rajesh Shashi Kumar
daf345024b Updated for v0.10.0 2025-08-20 21:02:46 +00:00
Rajesh Shashi Kumar
23267e4bf5 v0.9.1+ vLLM with FlashInfer 2025-06-25 20:03:20 +00:00
Rajesh Shashi Kumar
64ab367973 v0.9.1 2025-06-24 23:33:46 +00:00
Rajesh Shashi Kumar
3d7f1ed454 vllm 0.9.0.1 2025-06-18 21:49:59 +00:00
Rajesh Shashi Kumar
713775c491 Updates for vllm 0.9.0.1 2025-06-04 15:28:22 +00:00
Rajesh Shashi Kumar
c36ff9ee0e Updated 2025-06-04 04:47:47 +00:00
Rajesh Shashi Kumar
3d115911aa 0.9.0.1 2025-06-04 03:22:03 +00:00
Rajesh Shashi Kumar
3ea7d34e83 Merge branch 'main' of https://github.com/rajesh-s/containers 2025-06-03 22:34:34 +00:00
Rajesh Shashi Kumar
d30802ef41 Updated for vllm 0.9.0.1 2025-06-03 22:34:21 +00:00
Rajesh Shashi Kumar
b4ae9077ae Create native_build.sh 2025-05-29 14:34:50 -05:00
Rajesh Shashi Kumar
87c6773c8f v0.8.4 2025-05-27 20:34:14 +00:00
Rajesh Shashi Kumar
e205f17e2e Added nsys 2025-04-17 18:46:25 +00:00
Rajesh Shashi Kumar
256272732d Fixed numpy version 2025-04-07 18:38:30 +00:00
Rajesh Shashi Kumar
4d0dc5d06f numpy version fix 2025-04-03 20:35:56 +00:00
Rajesh Shashi Kumar
75e33490bd Working version with vLLM+LMCache 2025-04-01 23:34:16 +00:00
Rajesh Shashi Kumar
c63afb3d35 Working version with vLLM+LMCache 2025-04-01 23:33:43 +00:00
Ubuntu
57ceca8b4f vllm docker 0.8.1 with lmcache 2025-04-01 20:44:21 +00:00
Rajesh Shashi Kumar
9f2769285a Initial commit 2025-03-28 14:31:17 -05:00