biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:29:39 +00:00
79a41d9197 Save ~5-8 GiB GPU VRAM: move dummy weight to CPU
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:29:00 +00:00
cebc586014 Fix OOM: use 1-token warmup sample + free immediately
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:21:31 +00:00
5122cadc94 Update CURRENT_BUG.md: root cause found + fix committed
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:21:08 +00:00
6e6f95dfa8 FIX: Use warmup-based activation global scale in CuTeDSL linear kernel
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:18:12 +00:00
0a7769972f Fix garbled shared_expert_pipeline.py: imports/class were merged
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:17:39 +00:00
87453a53b0 Fix checkpoint keys: attn_hc.*, compressor.*, q_a_proj/q_b_proj/kv_proj
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:16:41 +00:00
f97762cc9f Fix full layer test: use correct checkpoint key names
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:15:09 +00:00
cc48a5715e Add full layer 0 B200 test: CuTeDSL vs BF16 reference
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:05:46 +00:00
dbaa3d6fe6 Update CURRENT_BUG.md and README with current state
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:52:42 +00:00
62abf41b03 Revert deepseek_v4_attention.py to ffc2264 — don't nuke existing patches
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:45:01 +00:00
4c2effa2be Fix attention patch: source from v0.21.0 stable, not local clone
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:41:00 +00:00
284b6a5d57 Fix attention patch: use original vllm imports, only patch forward method
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:37:28 +00:00
199efe0871 Fix dims: o_groups=16, o_lora_rank=1024 from config
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:36:25 +00:00
b4fee70151 Fix device mismatch in test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:32:56 +00:00
6b4b9774d1 Add B200 test: prove O-projection root cause + validate fix
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:30:21 +00:00
77baca668e Patch attention forward: BF16 inv RoPE + BMM wo_a + NVFP4 wo_b
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 06:03:13 +00:00
ffc2264c41 Fix activation global scale: don't double-invert input_global_scale_inv
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 05:41:57 +00:00
918342feeb MHC: replace monolithic layers/mhc.py with pure PyTorch
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 05:36:36 +00:00
dfd9c10ae9 Fix MHC import: don't import .torch from layers/mhc.py
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 05:31:06 +00:00
e404e18efb Also replace layers/mhc.py CustomOp dispatch