biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:29:47 +00:00
a87f20a4ae test: just 1 tmem_store, no fence, no loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:29:16 +00:00
2b57f28968 test: zero 128 TMEM columns, skip fence
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:28:33 +00:00
25c9b70591 test: zero 2 TMEM columns
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:28:02 +00:00
01c4097ccc test: zero 32 TMEM columns
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:27:27 +00:00
3694f63ba4 test: re-enable full TMEM zeroing (128 columns)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:26:53 +00:00
c3b6c3a5e6 test: minimal tmem_store debug (1 column + sentinels)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:26:18 +00:00
f1aaa50326 test: re-enable TMEM zeroing with tmem_base debug
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:25:39 +00:00
a7f81331f8 test: skip TMEM zeroing again, alloc+dealloc only
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:25:08 +00:00
3f5dcd481e test: zero only 32 TMEM columns
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:24:31 +00:00
2b1c8ce7df test: re-enable all TMEM ops (alloc, zero, dealloc)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:23:50 +00:00
acc7424a48 test: skip TMEM zeroing, just alloc+dealloc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:23:12 +00:00
ca419c52f3 test: re-enable TMEM alloc + zero
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:22:19 +00:00
09e8ea5933 test: fix compile error, skip TMEM read
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:21:56 +00:00
69bbc21300 test: skip all TMEM ops, just test SMEM layout + descriptor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:21:01 +00:00
a6c0ce51a2 test: skip MMA, just test descriptor values
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:20:14 +00:00
ea6b42e649 test_umma_qk: add descriptor debug output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:18:47 +00:00
0f6907b001 UMMA: fix descriptor + idesc — use gau-nernst tutorial values
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:16:37 +00:00
9b458d2a6c test_umma_qk: clean rewrite, hardcoded HD=16, explicit core-matrix layout writes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 09:15:44 +00:00
427410d94a UMMA: Rewrite fmha_umma_desc.cuh with correct K-major core-matrix layout + minimal QK GEMM test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:59:21 +00:00
68b4151d21 dump SMEM layout info