biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 00:18:57 +00:00

b1dd59293a Add prefill: process prompt tokens to fill KV cache before decoding

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 00:15:00 +00:00

178fb5483a Fix KV cache: use index 0 (one-layer cache per layer instance)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 00:11:17 +00:00

afcc690ddc Add full MoE routing + KV cache to single_shot

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 00:02:31 +00:00

3ecfbcba57 Fix T scope in post_block

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:59:20 +00:00

a493f72681 Add per-residual RMSNorm in mHC post_block (routed MoE missing)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:55:36 +00:00

49282fe206 Fix mHC: match vLLM torch reference exactly

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:48:34 +00:00

66a66f8244 Add per-layer NaN tracking for mHC debug

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:45:20 +00:00

d003c4b7cc Add mHC (Manifold-Constrained Hyper-Connections) to single_shot

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:39:47 +00:00

f567c20539 Fix: set active CUDA device per layer for BMM/FMHA

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 23:36:16 +00:00

7a95983e0f Rewrite single_shot: 8-GPU pipeline parallel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:59:30 +00:00

aac0fa1f08 Update STATUS.md + MEMORY.md: single-shot inference verified

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:58:51 +00:00

11c010e567 Update output section: kernel verified, architecture gaps noted

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:56:22 +00:00

53178d2536 Add emergency RMSNorm after residuals (missing mHC fallback)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:54:58 +00:00

172ba75e0c Add per-layer NaN check to track where values diverge

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:53:11 +00:00

ec7846e28c Add NaN tracking to single_shot_inference

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:51:12 +00:00

5fa6c88b17 Fix: replace FP4 Inf with 24 (avoid NaN in dequant)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:49:23 +00:00

904753f62a Fix: BMM batch dim alignment for wo_a

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:48:32 +00:00

52df3bc26c Fix: wo_a as batched matmul (grouped linear for output projection)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:46:14 +00:00

19240608d7 Fix: handle o_a_proj grouped linear shape mismatch

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-30 22:45:15 +00:00

1d02758416 Fix: kv_proj outputs hd=512 (1 KV head MQA), Z from compressor.gate_proj