biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:52:15 +00:00
7a21fa4bd8 test: add 2nd tmem_store to column 1
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:51:24 +00:00
4b129c146e test: add 1 tmem_load back
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:50:50 +00:00
61f19ce891 test: skip tmem_load, only store+dealloc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:50:11 +00:00
2513e1a692 test: use 64 threads, fence outside warp guard, 1 store
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:49:22 +00:00
abfe9dbaa1 test: only 1 tmem_store to verify single column works
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:48:29 +00:00
5795589abc test: TMEM 4 columns, individual store calls + loop load
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:46:50 +00:00
8a428f6127 test: TMEM column addressing test (128 cols, store+load)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:45:37 +00:00
ee3fe6d6b2 test: tmem_load column 1 only
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:44:31 +00:00
6c38c6e442 test: read 8 TMEM columns individually (no loop)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:43:18 +00:00
bcc6ed114d test: add 8KB padding after sQ to prevent MMA read overrun
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:41:53 +00:00
764ed01d6f test: try M=64 in descriptor + idesc to debug 4x factor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:40:20 +00:00
4cb656e583 test: try idesc=0 (same as gau-nernst)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:39:26 +00:00
cfba8484da test: try idesc with N=128 (full extent) + 128 TMEM cols
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:38:28 +00:00
30f0056b11 test: clean rewrite with SMEM Q/K verification and dot product check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:37:08 +00:00
7eb85a71fc test: add Q SMEM verification output + bf16_to_f32_host
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:35:59 +00:00
8f23c2aaf6 test: verify SMEM Q layout by reading back canonical data
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:35:04 +00:00
004046a6a8 test: read only 1 TMEM column after MMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:33:47 +00:00
41128122e3 test: clean rewrite, 32 TMEM cols, MMA N=32, tmem_load loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:32:36 +00:00
58be79957d test: 32 TMEM cols, add MMA call with N=32, read S from TMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:30:38 +00:00
22fb861447 test: 2 tmem_stores with syncwarp between