This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 21:43:06 +00:00
560760c824
Add STAGE_D2.md: Multi-query grid + head packing plan
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 21:41:23 +00:00
a9577eb18c
Remove obsolete STAGE_D1.3.md and SMEM_P_GUIDANCE_REQUEST.md
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 21:35:27 +00:00
ce24086b57
Docs: Update STAGE_D.md, README.md with hd=512 compilation blocker, lessons learned
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 17:55:35 +00:00
c90b05ee3b
D1.4: Use cutlass.range(unroll=1) for k_sub loops in both TMA and MMA warps
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 16:42:02 +00:00
7e24bb02d3
D1.4: Remove --opt-level 0 from hd512 test (use default opt level)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 16:36:49 +00:00
bd08bfee8e
D1.4: Fix merge test - use use_smem_p=False for hd=256 kernel (SMEM budget)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 16:31:08 +00:00
d70f083e17
D1.4: Add external k_sub merge test for hd=512 (avoids slow in-kernel k_sub compilation)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:43:29 +00:00
5ab02afbe9
D1.4: Use --opt-level 0 only (ptxas -j not supported, MLIR is the bottleneck)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:40:40 +00:00
4bb34ea3b3
D1.4: Use options string for compile flags (--ptxas-options -j64 --opt-level 0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:36:39 +00:00
8771f11fa0
D1.4: Add PtxasOptions -j64 + OptLevel(0) for faster hd=512 compilation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:13:18 +00:00
df10378bb5
D1.4: Fix regression test for un-normalized O output (D5a)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:11:40 +00:00
423d97b094
D1.4: Guard LSE computation with const_expr(not normalize) - fixes BF16 type mismatch in regression test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 15:10:29 +00:00
d8e2a8f33e
D1.4: Switch k_sub from cutlass.range to Python range (unrolled at trace time)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 14:23:23 +00:00
2f623a3f4b
D1.4: Fix tTMrO placeholder - define only inside const_expr block
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 14:22:46 +00:00
a8672c20b3
D1.4: Use cutlass.range loop for k_sub (reduce IR), guard O rescale with const_expr(n_kv_tiles>1)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 14:20:34 +00:00
402dd4567b
Fix: add cutlass import to test_d1_qk512
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 14:20:08 +00:00
04b8ca43ed
Fix: add cpasync import to test_d1_qk512
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 14:19:28 +00:00
e0cd810d39
D1.4: Add hd=512 QK-only and standalone test for compilation debugging
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 08:07:34 +00:00
090acfc0ce
D1.4: Reduce pv_n_tile to 128 for hd=512 to fit SMEM budget (192KB)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 07:03:16 +00:00
bc8331c9eb
D1: Unrolled k_sub path (hardcoded k_sub=0,1) to avoid cutlass.range IR explosion
First
Previous
...
67
68
69
70
71
...
Next
Last