biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 08:07:44 +00:00
72bf750a0b fix: revert to eager mode — CUDA graphs OOM with 175GB model
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 07:49:41 +00:00
baf44c92f8 fix: memory-efficient E2M1 quantization — no 32x distance tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 07:32:19 +00:00
a2cac7a7fe fix: remove CuTeDSL warmup — OOM with 175GB model loaded
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 07:16:00 +00:00
e0814eb54e fix: cast expert_offsets to int32 for CuTeDSL kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 07:03:10 +00:00
4b0a9557f0 fix: rewrite CuTeDSLMoERunner for CUDA graph compatibility
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:31:17 +00:00
dab31b0961 fix: missing tqdm import in weight_loader
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:28:19 +00:00
8496ac99bc dang clonkurs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:14:31 +00:00
e7c6274107 Revert "feat: auto-warmup in build_and_run.sh"
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:11:40 +00:00
f792537719 feat: auto-warmup in build_and_run.sh
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:09:23 +00:00
5d975d00d9 feat: tqdm progress bar for expert weight loading
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 06:02:14 +00:00
2e4ff6b8d4 fix: increase vLLM RPC timeout to 10 min for first-request JIT
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:51:36 +00:00
a569612df5 feat: add load progress heartbeats to prevent k8s health check kills
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:43:35 +00:00
e5370140cb docs: update README with full NVFP4 coverage, dequant anti-pattern, v2 status
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:36:36 +00:00
3445bd24c1 feat: keep attention weights native NVFP4 — stop dequantizing to BF16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:21:12 +00:00
4d4cfa6b28 fix: tqdm over MoE layer warmup, compile every layer, no print spam
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:18:16 +00:00
3838561c19 fix: only suppress compile message, still warmup all layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:16:54 +00:00
f19932d8db fix: compile CuTeDSL kernel once per process, not per MoE layer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 05:01:22 +00:00
936982c5aa fix: add layer-level tqdm for expert finalization, remove inner expert tqdm
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:56:47 +00:00
cf0731cf4b fix: warmup with 128 tokens (fills MMA tile), better error handling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:40:32 +00:00
a70d2d3984 fix: clearer warmup message — 'Compiling CuTeDSL NVFP4 MegaMoE kernel'