biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:43:32 +00:00
8cff68a28f D1: Use cutlass.range for k_sub loops (CuTeDSL immutable handle)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:42:27 +00:00
14c9000997 D1: Fix kvh scoping - define before loops, consume V via pipeline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:41:26 +00:00
553ee7be57 D1: Fix kvb→kvh typo in PV GEMM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:40:13 +00:00
9c0dbab280 D1: Remove qh.commit() - pipeline handles commit internally
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:38:16 +00:00
5c267fd2ad D1: TMA producer uses acquire_and_advance + commit (no wait_and_advance)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 06:36:20 +00:00
3e00c8e1bd D1: Use same pipeline API as working code (acquire_and_advance) for k_sub path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 05:02:19 +00:00
fcc69a5c56 D1: Add PipelineState for k_sub TMA path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:59:49 +00:00
22fedc4ed9 D1: Fix pipeline API for K sub-tile path (producer_acquire/commit)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:57:10 +00:00
e93dabe43c D1: K sub-tile MMA path using pipeline barriers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:53:47 +00:00
fd28718483 D1: Fix TMA copies in k_sub path (no mbarrier, use cp_async wait)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:51:54 +00:00
170b483c2f D1: Add K sub-tile loop for hd=512 (const_expr guarded, hd≤256 path unchanged)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:43:13 +00:00
bc5240c740 D1: Debug TMA partition shapes at hd=512
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:41:13 +00:00
2e732ce3a7 D1: K sub-tiling - qk_mma_tiler K-dim = k_tile=256, SMEM fits at hd=512
biondizzle pushed tag v0.4-d1-hd256 to biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:33:00 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:32:45 +00:00
4564a264db Docs: Update STAGE_D.md, README.md status for D1 hd≤256 milestone
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:07:41 +00:00
085d72ea8f D1: Full test with TMEM-P at hd=64,128,256,512
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:06:28 +00:00
2b3435f97c D1: Remove debug prints, clean up
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:05:19 +00:00
38c6486fc7 D1: const_expr for sP layout selection (CuTeDSL)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:04:30 +00:00
a945edea79 D1: Python if for sP layout (trace-time, not MLIR)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 04:03:30 +00:00
955b023164 D1: Tiny 4-mode sP placeholder for TMEM-P path