This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:49:17 +00:00
210391e571
fix: PersistentTileSchedulerParams constructor takes (problem_shape, cluster_shape) not from_shape
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:47:35 +00:00
824d054ad7
fix: inside cute.compile args are already CuTe tensors, no conversion needed
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:46:37 +00:00
6375e54396
fix: use from_dlpack + mark_layout_dynamic instead of non-existent to_cuTe_tensor in router
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:44:54 +00:00
cb2ca8591f
fix: add @cute.jit to router compiled function
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:44:03 +00:00
d5d2b7b4b8
fix: defer router MMA/TMA setup into cute.compile context (matches MoE pattern)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:39:53 +00:00
157f1c5258
fix: use OperandMajorMode from nvgpu (not deprecated tcgen05) and mma_tiler_mn in router kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:36:03 +00:00
1dbc57e2cd
fix: use mma_tiler_mn in _create_tiled_mma (attribute exists at init time)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:34:55 +00:00
d05dd50bf5
fix: OperandMajorMode.K not MAJOR_K (correct CuTeDSL API)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:33:34 +00:00
a6a8755439
single_shot: switch to head-packed FMHA dispatch (1 kernel launch vs 128)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:28:19 +00:00
80002f2efc
single_shot: production NVFP4 GEMM for ALL attention projections
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:21:12 +00:00
32efd5139d
Fix gate weight transpose: checkpoint is (E, H), Router expects (H, E)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:20:04 +00:00
e45c0ff51b
single_shot: use reference dequant for attn projections, focus on MoE+FMHA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:18:37 +00:00
dfbffa1df1
single_shot: CUDA_LAUNCH_BLOCKING for debugging
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:17:48 +00:00
a66fdf6049
single_shot: add sync to catch CUDA errors early
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:16:47 +00:00
0b35c36d23
single_shot: memory-efficient MoE loading, lazy Nvfp4Linear init
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:14:26 +00:00
050b5ee449
Fix n_h reference before assignment in single_shot
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:12:23 +00:00
c5adbbfde6
FMHA sink: don't double-scale sink bias
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:11:13 +00:00
4adee1207f
FMHA: zero-init my_p_vals to fix N<128 padding NaN
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 23:10:22 +00:00
13be3ad443
FMHA sink bias in kernel + single_shot production rewrite
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:55:13 +00:00
23e88638aa
single_shot: memory-efficient MoE loading (CPU stacking, one-shot GPU transfer)
First
Previous
...
21
22
23
24
25
...
Next
Last