biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:37:48 +00:00
60cabb186d fix: always provide valid gP tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:37:21 +00:00
49f54aef2d fix: const_expr for SMEM-P tma_p creation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:36:40 +00:00
6f0475f0db fix: guard tma_p creation with gP check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:36:13 +00:00
6068077e09 feat: SMEM-P via gP→TMA→sP (gP as caller param)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:35:33 +00:00
3063dbd5da D1: SMEM-P test at hd=128
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:34:24 +00:00
d9eb02fd8b merge
f550c7dd33 D1: SMEM-P coordinate-indexed store with dynamic sP shape extraction
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:33:43 +00:00
9ea7f511b0 fix: dummy tma_p reuse tma_q for non-SMEM-P
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:32:56 +00:00
00c9991396 merge
ae4c501903 merge: keep our fmha.py
5e154770de D1: SMEM-P using make_tiled_copy_C + retile (correct CUTLASS pattern)
Compare 3 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:32:17 +00:00
edacdae017 fix: use make_tiled_tma_atom_A for P TMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:31:28 +00:00
0f943ab48c feat: SMEM-P via gP→TMA→sP path (register→GMEM→SMEM)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:30:30 +00:00
d0aec403e4 D1: Layout diagnostic for SMEM-P
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:29:15 +00:00
dd7356afc6 D1: Simplified debug test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:28:28 +00:00
1f1b16ad07 D1: Add hd=128 debug test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:27:15 +00:00
0f52c3453c D1: Fix SMEM-P (coordinate store), LSE (FP32), add TMEM-P-only test
52570d94cb D1: Fix SMEM-P - coordinate-indexed store (replaces make_tiled_copy_C)
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:23:56 +00:00
70c9d93d28 feat: SMEM-P make_tiled_copy_C + zero-fill dest tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:23:24 +00:00
86b7ff8b36 Merge branch 'master' of ssh://sweetapi.com:2222/biondizzle/nvfp4-megamoe-kernel
e1a5077fda D1: LSE diagnostic at various hd
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:23:00 +00:00
7cf9704584 feat: SMEM-P using make_tiled_copy_C(qk_mma) approach
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:22:26 +00:00
efe6f4dff8 D1: Add diagnostic test (TMEM-P vs SMEM-P at various hd)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:21:07 +00:00
847372976f Merge branch 'master' of ssh://sweetapi.com:2222/biondizzle/nvfp4-megamoe-kernel
7142e459cc D1 test: compare un-norm O + norm using ref row_sum + LSE verification
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:20:28 +00:00
fe47a5f882 fix: LSE type mismatch Float32→BFloat16