biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 06:59:02 +00:00
9227b0e93f debug: skip hd_chunk>0 to isolate chunk0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 06:56:57 +00:00
25aeaca9ab fix: PV accumulate flag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 06:56:10 +00:00
1da785c070 D1.5: HD tiling (HD_CHUNK=256) for HD=512 support
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:49:34 +00:00
700524f183 test: HD=128/256 variants for D1.5
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:49:01 +00:00
f2544a4600 test: full matrix for D1.5 multirow multitile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:48:28 +00:00
5544d3a0a4 fix: TMEM reads must be outside my_row_active (warp-collective)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:46:13 +00:00
1dca8d8cfa debug: unbuffered stdout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:44:43 +00:00
8be8813d54 debug: more prints
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:42:36 +00:00
570396b4be debug: simplify test, add fflush
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:40:07 +00:00
0ad35f8be6 debug: add prints to multirow multitile test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 04:37:36 +00:00
dd3e0fdfc8 D1.5: multi-row + multi-tile FMHA with SMEM accumulator in-kernel rescale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 03:46:39 +00:00
10ae8f3346 auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 03:45:08 +00:00
8b1ac380ac feat: HD=512 support — TMEM_N=512, test variants for all three TMA kernels
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 03:20:50 +00:00
762f054d6d feat: double-buffer TMA pipeline in multi-row kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 03:14:09 +00:00
4a9c850e9c feat: double-buffer TMA pipeline for K loads in single-tile kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-29 22:59:19 +00:00
afa949071b fix: brace structure in V TMA conversion
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-29 22:58:02 +00:00
ec577f71ee feat: V TMA loads in single-tile kernel too
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-29 22:54:47 +00:00
422e7bb312 cleanup: v_head reference in multi-row (V via TMA now)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-29 22:51:26 +00:00
88c72a887e feat: V TMA loads in multi-row kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-29 22:48:52 +00:00
13403d2808 cleanup: remove unused v_head in multi-tile (V via TMA)