biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:44:11 +00:00
6f5be8a4e4 Debug: print P values
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:43:03 +00:00
3d15f5bb21 Debug: 1 PV K-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:42:22 +00:00
284a06ddf1 FMHA v5: clean rewrite with QK + softmax + PV SS per K-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:40:57 +00:00
342193e0b4 Fix tb scope
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:40:36 +00:00
a6f7ef7c45 Add softmax read from TMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:40:02 +00:00
38b0ff0bf8 Add QK GEMM to minimal PV test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:39:00 +00:00
e9f8f9e6e3 Minimal PV with s_p_vals in SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:38:04 +00:00
97ebb964a2 Move s_p_vals to dynamic SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:37:13 +00:00
d2387dd858 Full FMHA v4: per-K-tile P fill into reusable (128,16) buffer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:35:30 +00:00
78b470317f PV accumulation debug with detailed TMEM read
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:33:33 +00:00
dacbf53081 Test K-tiles 0-1 accumulated
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:32:54 +00:00
bad31d9476 Test K-tile 1
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:32:13 +00:00
9198ed734f Test 1 PV K-tile from (128,128) P at offset 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:31:23 +00:00
ce88cd6e9e Zero TMEM manually, all K-tiles accumulate=true
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:30:10 +00:00
727c509454 PV SS MMA with 8 K-tile accumulation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:29:15 +00:00
d5b0941f2e PV SS MMA with (128,128) P layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:28:26 +00:00
f94693fdc2 Fix: add back cudaDeviceSynchronize
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:28:03 +00:00
fb8af865f4 Check launch error
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:27:14 +00:00
738e39cb63 Debug: add printf at kernel start
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:26:38 +00:00
9e13096bf8 Debug: skip QK, write P directly to SMEM, 1 PV K-tile