biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:18:57 +00:00
b1dd59293a Add prefill: process prompt tokens to fill KV cache before decoding
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:15:00 +00:00
178fb5483a Fix KV cache: use index 0 (one-layer cache per layer instance)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:11:17 +00:00
afcc690ddc Add full MoE routing + KV cache to single_shot
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:02:31 +00:00
3ecfbcba57 Fix T scope in post_block
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:59:20 +00:00
a493f72681 Add per-residual RMSNorm in mHC post_block (routed MoE missing)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:55:36 +00:00
49282fe206 Fix mHC: match vLLM torch reference exactly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:48:34 +00:00
66a66f8244 Add per-layer NaN tracking for mHC debug
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:45:20 +00:00
d003c4b7cc Add mHC (Manifold-Constrained Hyper-Connections) to single_shot
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:39:47 +00:00
f567c20539 Fix: set active CUDA device per layer for BMM/FMHA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 23:36:16 +00:00
7a95983e0f Rewrite single_shot: 8-GPU pipeline parallel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:59:30 +00:00
aac0fa1f08 Update STATUS.md + MEMORY.md: single-shot inference verified
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:58:51 +00:00
11c010e567 Update output section: kernel verified, architecture gaps noted
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:56:22 +00:00
53178d2536 Add emergency RMSNorm after residuals (missing mHC fallback)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:54:58 +00:00
172ba75e0c Add per-layer NaN check to track where values diverge
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:53:11 +00:00
ec7846e28c Add NaN tracking to single_shot_inference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:51:12 +00:00
5fa6c88b17 Fix: replace FP4 Inf with 24 (avoid NaN in dequant)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:49:23 +00:00
904753f62a Fix: BMM batch dim alignment for wo_a
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:48:32 +00:00
52df3bc26c Fix: wo_a as batched matmul (grouped linear for output projection)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:46:14 +00:00
19240608d7 Fix: handle o_a_proj grouped linear shape mismatch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:45:15 +00:00
1d02758416 Fix: kv_proj outputs hd=512 (1 KV head MQA), Z from compressor.gate_proj