52c3aefe73
bump cache busters to 33 for debug build
2026-05-12 13:10:37 +00:00
79d866995f
bump cache buster 32 for packed FP4 mxf4nvf4 fix
2026-05-11 21:59:56 +00:00
c85b84b0fe
fix: staging kernel outputs unpacked E2M1 (1 byte/element, not packed 2/byte)
...
Matches the SMEM layout: float_e2m1_unpacksmem_t is 1 byte/element.
L1→L2 handoff uses unpacked format (same byte count as FP8).
No bandwidth savings at L1→L2 for v1 — can optimize later.
2026-05-11 21:29:33 +00:00
17ba5a9d7b
bump cache buster 30 for FP4 staging + DeepGEMM FP4 activations
2026-05-11 20:30:14 +00:00
50a945bde4
bump cache buster 29
2026-05-11 19:51:48 +00:00
35f6b66678
fix: UE8M0 reinterpret in DeepGEMM fold_global_scale + bump cache
2026-05-11 19:40:08 +00:00
f32d6b5b48
bump cache buster to 27
2026-05-11 19:26:21 +00:00
8ae2214bad
fix: reorder Dockerfile ARG before COPY for proper cache busting
2026-05-11 18:48:07 +00:00
436109081c
bump cache buster to 24
2026-05-11 16:12:56 +00:00
1da40c53da
fix: add patch cache buster to Dockerfile
2026-05-11 07:19:10 +00:00
c8564caf9d
fix: patch vLLM deepseek_v4.py directly in image
2026-05-11 06:09:40 +00:00
7c8c6cd67f
fix: add PYTHONPATH for deep_gemm import
2026-05-11 06:06:52 +00:00
cffb373759
fix: symlink NVRTC lib into cuda/lib64 for linker
2026-05-11 06:04:24 +00:00
983ba02c5b
fix: add CUDA/NVRTC lib paths to Dockerfile
2026-05-11 06:02:13 +00:00
f0471ed1c2
fix: correct CR URL to atl.vultrcr.com
2026-05-11 05:59:06 +00:00
c234190a80
feat: add Dockerfile + build/push script for NVFP4 container
...
- Extends dream-build with DeepGEMM nvfp4-mega-moe kernel
- build_push.sh: builds, logs into Vultr CR, pushes, updates docker-compose
- CACHE_BUSTER parameter for forcing fresh clones
2026-05-11 05:57:49 +00:00