biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:39:06 +00:00
f191af7e29 feat: warm up CuTeDSL kernel during model loading
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:28:32 +00:00
4d67b570b9 fix: descriptive tqdm labels — uint8→NVFP4 and NVFP4→FP8/BF16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:26:46 +00:00
8efdd165da fix: use tqdm for progress bars — single line, live updating
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:18:08 +00:00
830f042443 fix: PYTHONUNBUFFERED=1 so progress bars stream in real-time
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:14:10 +00:00
00b766af60 feat: add progress bars for expert quantization and post-load conversion
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:10:44 +00:00
b465579a02 cleanup: nuke all debug prints and env var gates from vLLM patch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:04:55 +00:00
174ad70dca fix: same gate/up split fix in moe_pipeline.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:04:41 +00:00
6d17988b51 fix: L1 gate/up split — intermediate_size is per-projection, not fused
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 04:02:04 +00:00
37aa0cbeab debug: add try/except with shape logging to _run_mega_moe
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:50:09 +00:00
b04bff7e8b feat: clean Dockerfile, docker-compose, import fixes for CuTeDSL build
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:43:32 +00:00
a0ff8a3278 fix: transpose checkpoint block scales (N,K_sf)→(K_sf,N) for bridge
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:41:24 +00:00
389453fbf4 feat: direct NVFP4 path — no BF16 round-trip on weights
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:36:13 +00:00
8fd9579127 feat: vLLM integration — replace C++ kernel with CuTeDSL
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:33:20 +00:00
3ec9c3074b docs: rewrite README, nuke DEBUG_LOG, add vLLM integration stub
biondizzle created branch the-last-of-cutlass in biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:30:53 +00:00
biondizzle pushed to the-last-of-cutlass at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:30:53 +00:00
biondizzle pushed tag the-last-of-cutlass to biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:30:53 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:24:15 +00:00
b685112c92 fix: lower cosine threshold to 0.98 for double-quantization loss
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:23:34 +00:00
6139cd6ff5 fix: rewrite layertest cleanly, test full MoE pipeline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:22:45 +00:00
09ff5c5b98 feat: full NVFP4 MoE pipeline (L1→SiLU→L2→scatter)