biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:57:40 +00:00
d518fcb82a test: correct sink bias reference — denominator-only, no V contribution
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:53:58 +00:00
9574a9dc2e test: add sink bias to reference SDPA in decode FMHA comparison
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:50:24 +00:00
9a9b347b2b test: add per-head magnitude ratio diagnostics to decode FMHA test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:47:03 +00:00
f5fa20c581 fix: syntax error — missing closing paren in indexer.forward call
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:46:26 +00:00
693975ec92 fix: device mismatches in decode FMHA test — dec_pos must be on per-layer GPU
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:39:16 +00:00
e1d96c509d test: decode FMHA layer comparison — checks FMHA accuracy during decode step
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:34:33 +00:00
1ebe7f0dde Add PART_A_NEXT_SESSION.md: clues for decode degeneration debugging
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:20:38 +00:00
d8306be3f2 Fix PART A test: proper FP8 quantization and MQA reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:18:26 +00:00
4126909dfb Simplify PART A test: compressor + FMHA at production scale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:15:43 +00:00
8c54cfa748 Fix KVCache init in PART A test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 04:13:54 +00:00
04cf8ca848 Add PART A diagnostic tests: compressor + KV cache + FMHA at production scale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:50:00 +00:00
75288bd12f Wire prefill FMHA into production.py and single_shot
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:48:19 +00:00
5417f65b08 CRITICAL FIX: Add T-dimension strides to prefill FMHA kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:47:02 +00:00
dd1cbe1faa Fix smem size for prefill debug test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:46:38 +00:00
09384a637a Fix constexpr issues in prefill debug test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:46:19 +00:00
d3dc8cf901 Add prefill T=2 debug CUDA test with intermediate value printing
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:22:52 +00:00
223c22488f Simplify prefill PV read: use decode kernel's exact pattern
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:05:28 +00:00
2bf5e74e61 Add prefill debug test: compare T=1 decode vs prefill kernel step by step
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 03:01:06 +00:00
eb69c3bfb9 CRITICAL FIX: add missing tb base in QK TMEM read address
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:59:21 +00:00
99b6de316b Fix prefill kernel: add missing tb base in PV TMEM read, fix ACCUMULATE for per-row PV