biondizzle

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 07:43:28 +00:00

6255c94359 Downgrade to CUDA 12.8.1 for vLLM compatibility

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 07:13:40 +00:00

ceab7ada22 Update flashinfer to v0.6.6 to match vLLM 0.18.x requirements

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 05:50:11 +00:00

9d88d4c7d8 Skip xformers - vLLM has built-in FlashAttention kernels

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 05:46:20 +00:00

45b6109ee1 Fix xformers TORCH_STABLE_ONLY issue + ramp up MAX_JOBS for native GH200

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 04:27:27 +00:00

b223c051de move things

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 04:26:56 +00:00

7f7ca4a742 move things

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 04:26:31 +00:00

2dc2008475 move things

980cd1b749 move things

1540b0c54e move things

Compare 3 commits »

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 03:49:43 +00:00

0b4ede8047 Add .gitignore for internal docs

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 03:34:59 +00:00

5c29d2bea7 Fix: LMCache default branch is 'dev' not 'main'

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 03:16:46 +00:00

9259555802 Fix: Actually update LMCache to main branch (previous edit failed)

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 03:14:03 +00:00

750906e649 Bleeding edge build: LMCache main, vLLM main, latest transformers

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 02:49:45 +00:00

a399fbc8c6 Add MAX_JOBS=2 for LMCache, restore vLLM build from source

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-03 00:05:58 +00:00

f8a9d372e5 Use PyPI vLLM wheel instead of building (QEMU cmake try_compile fails)

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 23:58:21 +00:00

436214bb72 Use PyPI triton wheel instead of building (QEMU segfaults)

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 23:44:11 +00:00

e5445512aa Reduce MAX_JOBS by half to reduce QEMU memory pressure

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 22:50:25 +00:00

4f94431af6 Revert CC/CXX to full paths, keep QEMU_CPU=max

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 22:47:54 +00:00

866c9d9db8 Add QEMU_CPU=max for better emulation compatibility during cross-compilation

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 22:47:25 +00:00

2ed1b1e2dd Fix: use CC=gcc CXX=g++ instead of full paths for QEMU compatibility

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 20:55:33 +00:00

14467bef70 Fix: add --no-build-isolation to pip wheel for flash-attention

biondizzle pushed to main at biondizzle/grace-gpu-containers

2026-04-02 20:24:27 +00:00

82b2ceacd5 Update build history and fix pip command docs