biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:12:31 +00:00
a0363e8911 Fix CuTeDSL scoping: hoist P store vars out of if block
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:10:48 +00:00
86bf5771c1 Fix O rescale: use Stage C proven correction_rescale pattern
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:09:26 +00:00
e204aa7a4c Fix tOrP0 indexing: 3-dim slice (None,None,kb) not 4-dim
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:08:18 +00:00
dda5afee87 Fix CuTeDSL scoping: unconditionally define tOrP0 and tCrP
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:07:32 +00:00
0f715bfaff Fix CuTeDSL variable scoping: define tOrP0 and tCrP in both branches
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:06:47 +00:00
70db626550 Fix p_tmem_s: use ComposedLayout from make_smem_layout_a, pass as kernel arg
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:04:53 +00:00
09cac38a67 Consolidate FMHA stages A/B/C into unified kernel module with SMEM-P stub
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:56:57 +00:00
6f834ae8b5 WIP: make_tiled_copy_C for P→SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:54:50 +00:00
8114a225d1 fix: cpasync.CopyOp for reg→SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:52:49 +00:00
0dbdc4f865 fix: CopyAtomUniversalOp
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:51:59 +00:00
05173c1992 WIP: tiled copy for P→SMEM (zero fill)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:51:02 +00:00
5a9c299f64 fix: cute.copy(dst, src) order
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:50:11 +00:00
398f5cf631 fix: BFloat16 not Float32 for bf16 reg
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:49:06 +00:00
9bc7fc9361 WIP: P→SMEM write stub (zero fill, proper mapping TODO)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:47:55 +00:00
ed35a8a4ba fix: partition_A not partition_S
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:47:10 +00:00
48432522b8 fix: make_smem_layout_epi not make_epilogue_smem_layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:46:02 +00:00
07f319d1f3 WIP: SMEM P path for PV (compiles but P write not implemented)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:42:55 +00:00
1be005296c debug: hd=64 with CUDA_LAUNCH_BLOCKING
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:42:10 +00:00
482928f142 D1: P store as BF16 using PV A-fragment layout (tOrP0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:40:11 +00:00
f266c3dae2 D1: align P store and PV A-fragment layouts via tP