biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:55:03 +00:00
9034f67b0f Fix prefill kernel: read ALL n_sub PV results (was only n_sub=0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:50:28 +00:00
a4ef6c3454 Add B1 mixed FP8 prefill FMHA kernel (T>1 support)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:47:49 +00:00
1f757151ef Fix router gate BF16 quantize path for production FMHA test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:38:17 +00:00
07168357cc Fix o_a_proj weight loading: add BF16 fallback for grouped linear
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:31:12 +00:00
27d8d80a40 Fix missing DEVICE constant in production FMHA test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:26:40 +00:00
26a817c2f2 Fix production FMHA layer test: compare raw FMHA vs SDPA on production gathered KV
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:22:36 +00:00
ba67e055f7 Add production FMHA layer comparison test
biondizzle pushed tag v-b1-b2-done-20260603 to biondizzle/nvfp4-megamoe-kernel 2026-06-03 02:14:54 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:53:00 +00:00
af58f2c5b2 Add B1 weight/format verification at L0 in single_shot
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:51:00 +00:00
8df5de5477 Update B1 docs with test results and bug fix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:50:23 +00:00
3e3b352e7e Update FINAL_STRETCH.md: B1 and B2 marked DONE with test results and bug fixes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:49:40 +00:00
84a02f8995 Remove debug test files, keep production B1/B2 unit tests
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:42:48 +00:00
6fa9ad7852 B2 indexer: adopt TMEM warp-to-row mapping fix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:12:12 +00:00
6c92ff91f3 B2 indexer: temporary heads 0-31 only while figuring out TMEM row 32-63 layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 01:09:03 +00:00
7732c93f62 Fix B2 indexer: use 16x256b.x1 TMEM read with TMEM_COLS=512
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:59:09 +00:00
a75a9843af Fix B2 indexer: add sLogits scratch buffer to SMEM layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:55:30 +00:00
cc7b17fdaa Fix B2 indexer: use 2-warps for TMEM read (P7 row-slice model)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:52:42 +00:00
8d0a02ca67 B2 TMEM debug: try stride=SK_TILE/8=16 for row group 32-63
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:50:53 +00:00
fdf702470c Add B2 TMEM read debug kernel and test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:46:51 +00:00
f1cf4c0215 Add B2 QK debug test with w_h=1 for simple comparison