biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 15:04:49 +00:00

9d57b0453b auto: pre-test commit

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 15:04:04 +00:00

1a6d9ee29b Reset to greedy decoding (temperature=0)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 15:03:58 +00:00

038fe81c68 Fix MoE non-fused L2 runtime gsa + update test harness for extra args

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:55:43 +00:00

a48d6e14ae Default temperature=0.7 with rep penalty

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:54:50 +00:00

1d64b863ca Add temperature sampling + repetition penalty to fix degenerate repetition

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:43:51 +00:00

6cca16f97a Set max-tokens=128 default, clean up for final verification

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:33:58 +00:00

a0e758ec3b Set default max-tokens=30 for faster iteration

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:21:18 +00:00

2b1fca6dae CRITICAL FIX: runtime activation global scale to prevent E4M3 overflow

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:15:29 +00:00

3b2714410f Add NVFP4 linear accuracy test: prod vs ref with all-ones input

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 14:11:49 +00:00

3e47d5f20a Add prod vs ref GEMM comparison test + gate logits diagnostic

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 13:55:56 +00:00

ad143afe37 Add L58-60 diagnostic: mHC A/B/C, MoE routed/shared, topk

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:25:53 +00:00

7a05d3d3af NVFP4 router gate: use Nvfp4Linear for both checkpoint and quantized paths

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:17:55 +00:00

e5dbe1ed22 Switch router to Nvfp4Linear production GEMM (custom CuTeDSL kernel crashes MLIR)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:14:06 +00:00

a4324781c3 Fix: properly remove sqrt(softplus) from CuTeDSL kernel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:12:43 +00:00

6efe90cd85 Move sqrt(softplus) out of CuTeDSL kernel into Python

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:08:08 +00:00

fbc1e883f2 Add try/except around fused NVFP4 gate loading with error reporting

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:05:10 +00:00

5f38430423 Fix: use 1-dim tensors for gate_ws2 and gate_input_scale

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 11:03:11 +00:00

ec8f292112 Fix: use self.mma_tiler_mnk (full K=64) for SMEM layout computation

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 10:55:46 +00:00

44fb9b6c00 Fix: pass self.mma_tiler_mnk (full K) to _compute_stages, not self.mma_tiler (K=1 placeholder)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 10:49:07 +00:00

be2bb2fe84 Fix: self.mma_tiler_mnk not mma_tiler_mnk