biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 11:32:57 +00:00
013f370046 test: all-ones data, expected S[0,j]=16.0 for every j
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 11:30:21 +00:00
f5a0966afc test: 4 warp leaders (lane==0) call MMA simultaneously
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 11:28:50 +00:00
c01d6fddf4 test: gau-nernst pattern — fence::after_thread_sync, 4 warps, 128 threads, 32x32b.x8 loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:23:07 +00:00
a048b56886 test: single-thread MMA + 0.25 scaling for 4× factor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:21:08 +00:00
57d67e6b51 test: revert to 64-bit descriptors, 4 warp leaders, 32x32b read
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:15:17 +00:00
32f7fa7bce Update CURRENT_ISSUE.md and MEMORY.md with UMMA 4× bug details
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:11:39 +00:00
3f95f1c5d4 test: try LBO with block_mn=32 (1/4 of M=128)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:10:08 +00:00
d03e353972 test: 4 warp leaders call MMA (Layout D requires 4 warps)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:08:37 +00:00
8059ed15ad test: explicitly zero padding between Q and K
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:07:23 +00:00
9e98c067ab test: Layout D TMEM read using 32x32b.x8 format, 4 warps
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:04:19 +00:00
68d1a7920c test: M=64 in both desc and idesc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:02:53 +00:00
0f51fda0da test: try N=8 in idesc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 10:01:40 +00:00
4f7c9649fd test: clean UMMA QK test, debug 4x factor, 8KB padding, 128 TMEM cols
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:59:45 +00:00
ac65ece33b test: TMEM 2-store with fence outside wid guard, 64 threads
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:58:54 +00:00
2c89eea6be test: fence+sync between 2 tmem_stores
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:58:03 +00:00
24c5afe1dc test: 64 threads, 2 stores to col 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:57:08 +00:00
987f2c8917 test: 2 tmem_stores to SAME column 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:56:18 +00:00
494149f034 test: 32 threads (1 warp), no guards, all participate
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:54:21 +00:00
f0cb71da5c test: TMEM 2-col with fence+sync between stores, separate wid==0 blocks
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:53:12 +00:00
b69a538ab1 test: add fence+sync between 2 tmem_stores