|
|
2ffbfda47d
|
test: print SMEM verify data
|
2026-05-28 11:51:08 +00:00 |
|
|
|
4fd41365de
|
test: add SMEM verify for HD=64 K-tile offsets
|
2026-05-28 11:49:44 +00:00 |
|
|
|
4483539f01
|
test: HD=64 random data, 4 K-tiles, accumulate
|
2026-05-28 11:47:56 +00:00 |
|
|
|
73bd21ce01
|
test: force 1 K-tile for HD=64 debug
|
2026-05-28 11:46:12 +00:00 |
|
|
|
abe1870429
|
test: HD=64 all-ones, expected S[0,j]=64 (unscaled) or 8.0 scaled
|
2026-05-28 11:44:31 +00:00 |
|
|
|
73f9ff98c9
|
test: UMMA QK HD=64 (4 K-tiles, accumulate) — multi-K-tile test
|
2026-05-28 11:42:29 +00:00 |
|
|
|
1874a70a6d
|
test: fix var ref
|
2026-05-28 11:39:15 +00:00 |
|
|
|
8426d13285
|
test: fix comparison — row 0 is S[0,c], rows 1-127 should be zero
|
2026-05-28 11:38:22 +00:00 |
|
|
|
6f40fafa91
|
test: verify ALL 128 rows × 8 cols match scalar reference
|
2026-05-28 11:36:46 +00:00 |
|
|
|
3c7d9d9303
|
test: apply 1/sqrt(HD) scale to MMA output — 4x was the scale factor, not a bug!
|
2026-05-28 11:34:45 +00:00 |
|
|
|
013f370046
|
test: all-ones data, expected S[0,j]=16.0 for every j
|
2026-05-28 11:32:56 +00:00 |
|
|
|
f5a0966afc
|
test: 4 warp leaders (lane==0) call MMA simultaneously
|
2026-05-28 11:30:19 +00:00 |
|
|
|
c01d6fddf4
|
test: gau-nernst pattern — fence::after_thread_sync, 4 warps, 128 threads, 32x32b.x8 loop
|
2026-05-28 11:28:47 +00:00 |
|
|
|
a048b56886
|
test: single-thread MMA + 0.25 scaling for 4× factor
|
2026-05-28 10:23:06 +00:00 |
|
|
|
57d67e6b51
|
test: revert to 64-bit descriptors, 4 warp leaders, 32x32b read
|
2026-05-28 10:21:06 +00:00 |
|
|
|
3f95f1c5d4
|
test: try LBO with block_mn=32 (1/4 of M=128)
|
2026-05-28 10:11:38 +00:00 |
|
|
|
d03e353972
|
test: 4 warp leaders call MMA (Layout D requires 4 warps)
|
2026-05-28 10:10:07 +00:00 |
|
|
|
8059ed15ad
|
test: explicitly zero padding between Q and K
|
2026-05-28 10:08:35 +00:00 |
|
|
|
9e98c067ab
|
test: Layout D TMEM read using 32x32b.x8 format, 4 warps
|
2026-05-28 10:07:15 +00:00 |
|
|
|
68d1a7920c
|
test: M=64 in both desc and idesc
|
2026-05-28 10:04:17 +00:00 |
|
|
|
0f51fda0da
|
test: try N=8 in idesc
|
2026-05-28 10:02:52 +00:00 |
|
|
|
4f7c9649fd
|
test: clean UMMA QK test, debug 4x factor, 8KB padding, 128 TMEM cols
|
2026-05-28 10:01:39 +00:00 |
|
|
|
ac65ece33b
|
test: TMEM 2-store with fence outside wid guard, 64 threads
|
2026-05-28 09:59:43 +00:00 |
|
|
|
2c89eea6be
|
test: fence+sync between 2 tmem_stores
|
2026-05-28 09:58:51 +00:00 |
|
|
|
24c5afe1dc
|
test: 64 threads, 2 stores to col 0
|
2026-05-28 09:57:53 +00:00 |
|
|
|
987f2c8917
|
test: 2 tmem_stores to SAME column 0
|
2026-05-28 09:57:07 +00:00 |
|
|
|
494149f034
|
test: 32 threads (1 warp), no guards, all participate
|
2026-05-28 09:56:17 +00:00 |
|
|
|
f0cb71da5c
|
test: TMEM 2-col with fence+sync between stores, separate wid==0 blocks
|
2026-05-28 09:54:19 +00:00 |
|
|
|
b69a538ab1
|
test: add fence+sync between 2 tmem_stores
|
2026-05-28 09:53:10 +00:00 |
|
|
|
7a21fa4bd8
|
test: add 2nd tmem_store to column 1
|
2026-05-28 09:52:05 +00:00 |
|
|
|
4b129c146e
|
test: add 1 tmem_load back
|
2026-05-28 09:51:21 +00:00 |
|
|
|
61f19ce891
|
test: skip tmem_load, only store+dealloc
|
2026-05-28 09:50:48 +00:00 |
|
|
|
2513e1a692
|
test: use 64 threads, fence outside warp guard, 1 store
|
2026-05-28 09:50:09 +00:00 |
|
|
|
abfe9dbaa1
|
test: only 1 tmem_store to verify single column works
|
2026-05-28 09:49:21 +00:00 |
|
|
|
5795589abc
|
test: TMEM 4 columns, individual store calls + loop load
|
2026-05-28 09:48:27 +00:00 |
|
|
|
8a428f6127
|
test: TMEM column addressing test (128 cols, store+load)
|
2026-05-28 09:46:49 +00:00 |
|
|
|
ee3fe6d6b2
|
test: tmem_load column 1 only
|
2026-05-28 09:45:34 +00:00 |
|
|
|
6c38c6e442
|
test: read 8 TMEM columns individually (no loop)
|
2026-05-28 09:44:30 +00:00 |
|
|
|
bcc6ed114d
|
test: add 8KB padding after sQ to prevent MMA read overrun
|
2026-05-28 09:43:17 +00:00 |
|
|
|
764ed01d6f
|
test: try M=64 in descriptor + idesc to debug 4x factor
|
2026-05-28 09:41:50 +00:00 |
|
|
|
4cb656e583
|
test: try idesc=0 (same as gau-nernst)
|
2026-05-28 09:40:19 +00:00 |
|
|
|
cfba8484da
|
test: try idesc with N=128 (full extent) + 128 TMEM cols
|
2026-05-28 09:39:19 +00:00 |
|
|
|
30f0056b11
|
test: clean rewrite with SMEM Q/K verification and dot product check
|
2026-05-28 09:38:26 +00:00 |
|
|
|
7eb85a71fc
|
test: add Q SMEM verification output + bf16_to_f32_host
|
2026-05-28 09:37:07 +00:00 |
|
|
|
8f23c2aaf6
|
test: verify SMEM Q layout by reading back canonical data
|
2026-05-28 09:35:58 +00:00 |
|
|
|
004046a6a8
|
test: read only 1 TMEM column after MMA
|
2026-05-28 09:35:02 +00:00 |
|
|
|
41128122e3
|
test: clean rewrite, 32 TMEM cols, MMA N=32, tmem_load loop
|
2026-05-28 09:33:45 +00:00 |
|
|
|
58be79957d
|
test: 32 TMEM cols, add MMA call with N=32, read S from TMEM
|
2026-05-28 09:32:33 +00:00 |
|
|
|
22fb861447
|
test: 2 tmem_stores with syncwarp between
|
2026-05-28 09:30:37 +00:00 |
|
|
|
a87f20a4ae
|
test: just 1 tmem_store, no fence, no loop
|
2026-05-28 09:29:46 +00:00 |
|