biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 13:02:04 +00:00
834d682443 test: full FMHA HD=16 pipeline (QK→softmax→PV→epilogue)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 13:00:40 +00:00
3b8be4b2db test: FMHA softmax (QK→read S→softmax→write P→read P→verify)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:57:41 +00:00
c936940428 test: separate (128,16) SMEM per K-tile with correct source stride
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:56:41 +00:00
f244c4fdd2 test: single-thread MMA (tid==0) for Layout D
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:55:54 +00:00
ba2e390e1e test: debug single K-tile from full (128,64) SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:55:03 +00:00
a7e8b483cd test: HD=64 multi-K-tile with correct source stride in SMEM writes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:54:04 +00:00
926ae5d7bf test: fix K source stride mismatch in manual SMEM write
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:53:15 +00:00
7d16a30cb6 test: exact HD=16 pattern with HD=64 data
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:52:22 +00:00
db4f661843 test: debug with (128,16) SMEM matching HD=16 exactly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:51:34 +00:00
b703dc0a50 test: debug single K-tile with offset descriptor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:50:45 +00:00
435ca037cf test: use accumulate=false for first K-tile, skip TMEM zero
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:50:08 +00:00
e8ac2120ad test: HD=64 QK with contiguous SMEM + offset descriptors
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:48:47 +00:00
1c01e8e412 test: fix inline asm line continuation for nvcc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:47:58 +00:00
71c774027c test: fix HD=64 QK — zero TMEM, fence after MMA, single-thread MMA call
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:23:50 +00:00
1bf76388c8 test: always accumulate, separate SMEM per K-tile, TMEM starts at 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:20:03 +00:00
8707f555c2 test: add extra syncwarp + syncthreads for MMA safety
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:18:07 +00:00
5a65d46c26 test: HD=64 with separate SMEM per K-tile — no offset descriptors needed
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:16:10 +00:00
526fafb808 test: revert volatile, fix wid==0, full 4 K-tiles
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:13:25 +00:00
de879342dd test: 1 K-tile, volatile writes, verify SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 12:11:48 +00:00
bd6440fd83 test: volatile SMEM writes + 2 K-tiles