biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:02:03 +00:00
762f45a8a2 D1: Conditional sP allocation (saves 64KB SMEM for TMEM-P at hd=256)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:00:20 +00:00
2184c79732 D1: Fix sP dummy allocation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:59:29 +00:00
44d96f2cf8 D1: Skip sP allocation when use_smem_p=False (saves 64KB at hd=256)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:58:18 +00:00
8f9b6518f1 D1: Fix syntax (separate kv_stage line)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:55:44 +00:00
6597a1cd16 D1: Reduce kv_stage to 1 at hd>128 to avoid SMEM overflow
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:53:21 +00:00
39367265e5 D1: FIX qk_mma_tiler K-dim = head_dim (was hardcoded to 64, broke hd>64)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:51:42 +00:00
13675a12f8 D1: Print qk_ik in _setup
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:49:57 +00:00
58ed4f5e8b D1: Add more debug prints (QK/PV mode2 sizes)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:48:38 +00:00
96a52e63e9 D1: SMEM-P test at hd=128
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:46:38 +00:00
ebde1d67fd D1: Add sP shape debug print
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:45:59 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:45:59 +00:00
65663226b8 restore: D1.5 version (8165262)
7b0b3ad1fc fix: fence_proxy for async global
3b6f19043f fix: use make_tiled_tma_atom instead of _A for P TMA
244aca13ed feat: SMEM-P reuse Q GMEM as gP buffer
248a827d0d feat: SMEM-P via gP→TMA→sP path (clean implementation)
Compare 28 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:44:07 +00:00
5d0f34f0e7 D1: Fix test - remove gP param
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:43:48 +00:00
3b6f19043f fix: use make_tiled_tma_atom instead of _A for P TMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:43:19 +00:00
244aca13ed feat: SMEM-P reuse Q GMEM as gP buffer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:42:12 +00:00
248a827d0d feat: SMEM-P via gP→TMA→sP path (clean implementation)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:40:17 +00:00
18ab507896 revert: fmha.py to D1 (0f52c34, before D1.5 changes)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:40:06 +00:00
84f07aa955 Merge branch 'master' of ssh://sweetapi.com:2222/biondizzle/nvfp4-megamoe-kernel
4a3033e37b D1: Test SMEM-P via gP→TMA at hd=128
18f48036b1 Merge branch 'master' of ssh://sweetapi.com:2222/biondizzle/nvfp4-megamoe-kernel
9d204d953f D1: Add SMEM-P debug prints
Compare 4 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:39:32 +00:00
5ad8250ab0 revert: fmha.py to last working version (8165262)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:38:13 +00:00
aed520cf6b fix: remove dynamic gP creation