biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 08:07:44 +00:00

72bf750a0b fix: revert to eager mode — CUDA graphs OOM with 175GB model

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 07:49:41 +00:00

baf44c92f8 fix: memory-efficient E2M1 quantization — no 32x distance tensor

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 07:32:19 +00:00

a2cac7a7fe fix: remove CuTeDSL warmup — OOM with 175GB model loaded

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 07:16:00 +00:00

e0814eb54e fix: cast expert_offsets to int32 for CuTeDSL kernel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 07:03:10 +00:00

4b0a9557f0 fix: rewrite CuTeDSLMoERunner for CUDA graph compatibility

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:31:17 +00:00

dab31b0961 fix: missing tqdm import in weight_loader

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:28:19 +00:00

8496ac99bc dang clonkurs

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:14:31 +00:00

e7c6274107 Revert "feat: auto-warmup in build_and_run.sh"

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:11:40 +00:00

f792537719 feat: auto-warmup in build_and_run.sh

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:09:23 +00:00

5d975d00d9 feat: tqdm progress bar for expert weight loading

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 06:02:14 +00:00

2e4ff6b8d4 fix: increase vLLM RPC timeout to 10 min for first-request JIT

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:51:36 +00:00

a569612df5 feat: add load progress heartbeats to prevent k8s health check kills

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:43:35 +00:00

e5370140cb docs: update README with full NVFP4 coverage, dequant anti-pattern, v2 status

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:36:36 +00:00

3445bd24c1 feat: keep attention weights native NVFP4 — stop dequantizing to BF16

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:21:12 +00:00

4d4cfa6b28 fix: tqdm over MoE layer warmup, compile every layer, no print spam

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:18:16 +00:00

3838561c19 fix: only suppress compile message, still warmup all layers

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:16:54 +00:00

f19932d8db fix: compile CuTeDSL kernel once per process, not per MoE layer

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 05:01:22 +00:00

936982c5aa fix: add layer-level tqdm for expert finalization, remove inner expert tqdm

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 04:56:47 +00:00

cf0731cf4b fix: warmup with 128 tokens (fills MMA tile), better error handling

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 04:40:32 +00:00

a70d2d3984 fix: clearer warmup message — 'Compiling CuTeDSL NVFP4 MegaMoE kernel'