biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:00:26 +00:00
f97aee6eed plan update
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:40:47 +00:00
487d960a6a D5c multi-tile: VERIFIED cos 0.999996 with Python KV merge + sink bias
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:39:40 +00:00
c9eab3c7e0 diag: rewrite multi-tile test with explicit per-segment compile and reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:37:08 +00:00
7f983fb855 diag: add direct segment 0 test to compare with run_segment
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:34:59 +00:00
2a5f9dc6e3 fix: simplify run_segment to use full hd V tensor (was incorrectly splitting by pv_n_tile)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:33:40 +00:00
aa2df1a202 diag: test n_comp=96 with sink bias directly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:31:39 +00:00
25b236fe00 diag: test D5c multi-tile with no sink bias to isolate issue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:29:54 +00:00
a3989929de diag: per-segment reference comparison for D5c multi-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:28:20 +00:00
bbc29945e8 diag: add per-segment debug output for D5c multi-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:26:53 +00:00
e64392f1ac D5c: add apply_sink_bias flag (independent of n_comp)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:23:24 +00:00
60b6f8ee79 fix: use separate kernels for segments with/without compressed KV (n_comp is compile-time)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:22:05 +00:00
2efd15c852 fix: correct swa_len_local calculation per segment for D5c multi-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:20:48 +00:00
3abcc7ff09 D5c: multi-tile test using Python KV merge with sink bias
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:17:18 +00:00
57a8316bc1 update README: D5c sink bias DONE (cos 0.999996, single KV tile)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:14:37 +00:00
8f7df4d8b5 fix: mRowSums dummy tensor must match mLSE layout (3D, not 1D)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:11:52 +00:00
ffc4b542bc D5c: use single KV tile (s_k=128) to avoid broken O rescale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:09:51 +00:00
fc0f4bcf23 diag: test D5c with single KV tile (s_k=128) to isolate O rescale issue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:08:41 +00:00
e5381b7312 diag: add baseline test (s_k=256 D3 mask, no sink bias) to isolate D5c issue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:07:24 +00:00
016edbcc97 D5c: add row_sum output for proper external normalization
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:04:49 +00:00
31e6426049 fix: normalize kernel output using per-row LSE for D5c test