Commit Graph

16 Commits

Author SHA1 Message Date
52c3aefe73 bump cache busters to 33 for debug build 2026-05-12 13:10:37 +00:00
79d866995f bump cache buster 32 for packed FP4 mxf4nvf4 fix 2026-05-11 21:59:56 +00:00
c85b84b0fe fix: staging kernel outputs unpacked E2M1 (1 byte/element, not packed 2/byte)
Matches the SMEM layout: float_e2m1_unpacksmem_t is 1 byte/element.
L1→L2 handoff uses unpacked format (same byte count as FP8).
No bandwidth savings at L1→L2 for v1 — can optimize later.
2026-05-11 21:29:33 +00:00
17ba5a9d7b bump cache buster 30 for FP4 staging + DeepGEMM FP4 activations 2026-05-11 20:30:14 +00:00
50a945bde4 bump cache buster 29 2026-05-11 19:51:48 +00:00
35f6b66678 fix: UE8M0 reinterpret in DeepGEMM fold_global_scale + bump cache 2026-05-11 19:40:08 +00:00
f32d6b5b48 bump cache buster to 27 2026-05-11 19:26:21 +00:00
8ae2214bad fix: reorder Dockerfile ARG before COPY for proper cache busting 2026-05-11 18:48:07 +00:00
436109081c bump cache buster to 24 2026-05-11 16:12:56 +00:00
1da40c53da fix: add patch cache buster to Dockerfile 2026-05-11 07:19:10 +00:00
c8564caf9d fix: patch vLLM deepseek_v4.py directly in image 2026-05-11 06:09:40 +00:00
7c8c6cd67f fix: add PYTHONPATH for deep_gemm import 2026-05-11 06:06:52 +00:00
cffb373759 fix: symlink NVRTC lib into cuda/lib64 for linker 2026-05-11 06:04:24 +00:00
983ba02c5b fix: add CUDA/NVRTC lib paths to Dockerfile 2026-05-11 06:02:13 +00:00
f0471ed1c2 fix: correct CR URL to atl.vultrcr.com 2026-05-11 05:59:06 +00:00
c234190a80 feat: add Dockerfile + build/push script for NVFP4 container
- Extends dream-build with DeepGEMM nvfp4-mega-moe kernel
- build_push.sh: builds, logs into Vultr CR, pushes, updates docker-compose
- CACHE_BUSTER parameter for forcing fresh clones
2026-05-11 05:57:49 +00:00