nvfp4-megamoe-kernel

Author	SHA1	Message	Date
biondizzle	ff6bb32684	Plumb global scale as GEMM alpha instead of folding into UE4M3 stage_activation now returns (x_fp4, x_sf, input_global_scale). The global scale is applied as the CUTLASS GEMM alpha parameter in the epilogue: D = alpha * A @ B, avoiding the fp32→UE4M3 round-trip that folding would introduce. Changes: - stage_activation: returns global scale as 3rd value - cutlass_nvfp4_gemm C++ binding: alpha param (was hardcoded 1.0) - cutlass_grouped_nvfp4_gemm: passes alpha to per-expert GEMM - nvfp4_mega_moe_l1/l2: accept alpha, pass to grouped GEMM - nvfp4_moe_full: reads symm_buffer.input_global_scale for L1, uses stage_activation's returned global scale for L2 - SymmBuffer: added input_global_scale field - vllm patch: stores global scale from stage_activation	2026-05-15 03:32:19 +00:00
biondizzle	108ff07569	debug: remove one-shot gate from logit dump, log every forward	2026-05-15 03:01:05 +00:00
biondizzle	3600a4b06a	debug: add logit quality dump in compute_logits (ungated, once)	2026-05-15 02:37:23 +00:00
biondizzle	29f8b8c174	fix: load lm_head.weight in outer model before forwarding to inner lm_head lives on DeepseekV4ForCausalLM, not DeepseekV4Model. The inner load_weights silently drops it (not in params_dict). Extract it in the outer loader, load it directly, then forward the rest to the inner model.	2026-05-15 02:17:16 +00:00
biondizzle	46536e5ccf	fix: hc param renames missing leading dot .attn_hc.base -> hc_attn_base produced layers.0hc_attn_base (no dot). Need .hc_attn_base to preserve the dot separator.	2026-05-15 01:54:53 +00:00
biondizzle	086f3fa5c5	fix: hc params dot→underscore + compressor position_bias→ape combined rule Two fixes: 1. attn_hc.base → hc_attn_base (underscore not dot before base/fn/scale) Same for fn, scale, and ffn_hc variants. 2. compressor.position_bias → compressor.ape was never firing because the .self_attn.compressor. rule matched first (break). Added combined .self_attn.compressor.position_bias → .attn.mla_attn.compressor.ape.	2026-05-15 01:29:00 +00:00
biondizzle	44d4b6c225	fix: add missing renames for Hadamard coding + compressor.ape Three more dropped checkpoint→model mappings: 1. hc_head: checkpoint has hc_head.hc_base/fn/scale, model has hc_head_base/fn/scale (underscore not dot separator) 2. attn_hc/ffn_hc: checkpoint has .attn_hc. and .ffn_hc., model has .hc_attn. and .hc_ffn. (word order reversed) 3. compressor.position_bias → compressor.ape: checkpoint name is position_bias, model attr is ape (absolute position encoding) All 461 remaining zero params should now be just indexer.k_norm.bias (legit zero - no bias in checkpoint, only weight).	2026-05-15 01:16:19 +00:00
biondizzle	af6583eb19	fix: unpack uint8 NVFP4→bf16 for non-stacked params (weights_proj) indexer.weights_proj is uint8 [64,3584] in checkpoint but bf16 [64,7168] in model. The uint8→bf16 unpack logic only ran in the stacked_params loop, so non-stacked NVFP4 params hit a size mismatch assertion.	2026-05-15 00:51:50 +00:00
biondizzle	e6ed9facf3	fix: indexer + shared_experts + compressor checkpoint→model key renames Three categories of missed renames in CKPT_KEY_SUBST: 1. Shared experts: .shared_experts.gate_proj.→.ffn.shared_experts.w1. fired but break prevented .mlp.→.ffn. from also applying, producing mlp.ffn.shared_experts.w1. (double prefix). Fixed by including .mlp. in the pattern. Added missing .shared_experts.down_proj. rule. 2. Indexer (layers 2+): .self_attn.compressor.indexer.* was caught by the generic .self_attn.compressor.→.attn.mla_attn.compressor. rule, producing wrong path attn.mla_attn.compressor.indexer.* instead of attn.indexer.*. Added indexer-specific patterns (q_b_proj→wq_b, kv_norm→k_norm, position_bias→compressor.ape, gate_proj→compressor.wgate, kv_proj→compressor.wkv) before the generic compressor rule. 3. Compressor kv_proj/gate_proj: old .compressor.kv_proj.→.compressor.wkv. pattern could never fire because .self_attn.compressor. matched first (break). Merged into combined patterns that handle both the self_attn.compressor→attn.mla_attn.compressor path AND the projection rename in one step.	2026-05-15 00:39:37 +00:00
biondizzle	21018fca8a	fix: shared_experts missing ffn. prefix in checkpoint→model rename Checkpoint keys are model.layers.N.shared_experts.gate_proj.weight but model params are layers.N.ffn.shared_experts.gate_up_proj.weight. The .ffn. was missing from the rename, so stacked gate_up_proj never matched params_dict.	2026-05-15 00:17:59 +00:00
biondizzle	483046b9d6	fix: shared_experts gate_up_proj stacking was skipped by .experts. check The stacking logic skipped any key containing '.experts.' to avoid MoE routed expert weights. But 'shared_experts' also matches that substring, so gate_proj and up_proj were never stacked into gate_up_proj. Changed to '.ffn.experts.' which only matches the routed experts path. Also includes POST-LOAD all-zero param scan.	2026-05-15 00:08:04 +00:00
biondizzle	8dbd616add	a little more debug1	2026-05-15 00:02:00 +00:00
biondizzle	756ea2192f	clean up and possible big fix	2026-05-14 23:41:10 +00:00
biondizzle	9f01307c5b	debug more7	2026-05-14 23:20:19 +00:00
biondizzle	e4f52c8900	debug more5	2026-05-14 23:01:59 +00:00
biondizzle	e46ff41569	debug more4	2026-05-14 22:50:51 +00:00
biondizzle	fd5f04eb15	debug more3	2026-05-14 22:36:34 +00:00
biondizzle	7573f12659	debug more2	2026-05-14 22:26:22 +00:00
biondizzle	11bbf675af	debug more	2026-05-14 22:21:30 +00:00
biondizzle	5bbe51357c	damn clankers	2026-05-14 20:23:42 +00:00
biondizzle	6aae8f1393	more fixes7	2026-05-14 20:11:37 +00:00
biondizzle	4363eee2ce	more fixes6	2026-05-14 20:08:25 +00:00
biondizzle	40b980b9d6	more fixes5	2026-05-14 19:55:34 +00:00
biondizzle	d56e86b40e	more fixes4	2026-05-14 19:51:56 +00:00
biondizzle	bf17bd3fc4	more fixes3	2026-05-14 19:47:02 +00:00
biondizzle	c68f4e9d6e	more fixes2	2026-05-14 19:43:24 +00:00
biondizzle	4749a92fca	more fixes	2026-05-14 19:39:16 +00:00
biondizzle	1ceff541b0	more fixes	2026-05-14 19:35:39 +00:00
biondizzle	3be051e140	fix	2026-05-14 19:29:47 +00:00
biondizzle	57512d5f0d	clean up	2026-05-14 19:20:08 +00:00
biondizzle	0d8e1bd035	restructure: move Dockerfile and docker-compose to root, docker/ → vllm/	2026-05-14 18:47:30 +00:00

31 Commits