biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 15:04:49 +00:00
9d57b0453b auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 15:04:04 +00:00
1a6d9ee29b Reset to greedy decoding (temperature=0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 15:03:58 +00:00
038fe81c68 Fix MoE non-fused L2 runtime gsa + update test harness for extra args
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:55:43 +00:00
a48d6e14ae Default temperature=0.7 with rep penalty
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:54:50 +00:00
1d64b863ca Add temperature sampling + repetition penalty to fix degenerate repetition
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:43:51 +00:00
6cca16f97a Set max-tokens=128 default, clean up for final verification
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:33:58 +00:00
a0e758ec3b Set default max-tokens=30 for faster iteration
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:21:18 +00:00
2b1fca6dae CRITICAL FIX: runtime activation global scale to prevent E4M3 overflow
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:15:29 +00:00
3b2714410f Add NVFP4 linear accuracy test: prod vs ref with all-ones input
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 14:11:49 +00:00
3e47d5f20a Add prod vs ref GEMM comparison test + gate logits diagnostic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 13:55:56 +00:00
ad143afe37 Add L58-60 diagnostic: mHC A/B/C, MoE routed/shared, topk
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:25:53 +00:00
7a05d3d3af NVFP4 router gate: use Nvfp4Linear for both checkpoint and quantized paths
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:17:55 +00:00
e5dbe1ed22 Switch router to Nvfp4Linear production GEMM (custom CuTeDSL kernel crashes MLIR)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:14:06 +00:00
a4324781c3 Fix: properly remove sqrt(softplus) from CuTeDSL kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:12:43 +00:00
6efe90cd85 Move sqrt(softplus) out of CuTeDSL kernel into Python
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:08:08 +00:00
fbc1e883f2 Add try/except around fused NVFP4 gate loading with error reporting
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:05:10 +00:00
5f38430423 Fix: use 1-dim tensors for gate_ws2 and gate_input_scale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 11:03:11 +00:00
ec8f292112 Fix: use self.mma_tiler_mnk (full K=64) for SMEM layout computation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 10:55:46 +00:00
44fb9b6c00 Fix: pass self.mma_tiler_mnk (full K) to _compute_stages, not self.mma_tiler (K=1 placeholder)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 10:49:07 +00:00
be2bb2fe84 Fix: self.mma_tiler_mnk not mma_tiler_mnk