biondizzle

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:29:39 +00:00

79a41d9197 Save ~5-8 GiB GPU VRAM: move dummy weight to CPU

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:29:00 +00:00

cebc586014 Fix OOM: use 1-token warmup sample + free immediately

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:21:31 +00:00

5122cadc94 Update CURRENT_BUG.md: root cause found + fix committed

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:21:08 +00:00

6e6f95dfa8 FIX: Use warmup-based activation global scale in CuTeDSL linear kernel

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:18:12 +00:00

0a7769972f Fix garbled shared_expert_pipeline.py: imports/class were merged

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:17:39 +00:00

87453a53b0 Fix checkpoint keys: attn_hc.*, compressor.*, q_a_proj/q_b_proj/kv_proj

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:16:41 +00:00

f97762cc9f Fix full layer test: use correct checkpoint key names

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:15:09 +00:00

cc48a5715e Add full layer 0 B200 test: CuTeDSL vs BF16 reference

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:05:46 +00:00

dbaa3d6fe6 Update CURRENT_BUG.md and README with current state

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:52:42 +00:00

62abf41b03 Revert deepseek_v4_attention.py to ffc2264 — don't nuke existing patches

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:45:01 +00:00

4c2effa2be Fix attention patch: source from v0.21.0 stable, not local clone

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:41:00 +00:00

284b6a5d57 Fix attention patch: use original vllm imports, only patch forward method

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:37:28 +00:00

199efe0871 Fix dims: o_groups=16, o_lora_rank=1024 from config

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:36:25 +00:00

b4fee70151 Fix device mismatch in test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:32:56 +00:00

6b4b9774d1 Add B200 test: prove O-projection root cause + validate fix

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:30:21 +00:00

77baca668e Patch attention forward: BF16 inv RoPE + BMM wo_a + NVFP4 wo_b

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 06:03:13 +00:00

ffc2264c41 Fix activation global scale: don't double-invert input_global_scale_inv

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 05:41:57 +00:00

918342feeb MHC: replace monolithic layers/mhc.py with pure PyTorch

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 05:36:36 +00:00

dfd9c10ae9 Fix MHC import: don't import .torch from layers/mhc.py

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 05:31:06 +00:00

e404e18efb Also replace layers/mhc.py CustomOp dispatch