This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 08:54:27 +00:00
b94f8d4ed8
Test: fused router kernel vs BF16 reference path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 08:49:48 +00:00
2433700a69
Fused router kernel: rewrite epilogue with proper CuTeDSL constructs
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 08:37:18 +00:00
d01b4b02de
Complete NVFP4 fused router kernel: full MMA + router epilogue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:55:31 +00:00
25b9a5f32d
Fix test: use from_dlpack for c_tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:54:37 +00:00
d2819fc39c
Fix test: use as_tensor instead of make_tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:53:44 +00:00
5ea71ebd78
Add NVFP4 CuTeDSL compilation test (verify MmaMXF4NVF4Op compiles)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:53:23 +00:00
fa6dbd4aa2
WIP: Rewrite NVFP4 fused router in CuTeDSL with MmaMXF4NVF4Op (sf_vec_size=16)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:51:34 +00:00
4f706b55d7
Remove raw CUDA C++ fused router and DeepGEMM (MXFP4, wrong instruction)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:44:55 +00:00
424fe6bf2c
Fix: use SM100_MMA_MXF8F6F4_SS (not MXF4) to match Nvfp4Linear path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:41:45 +00:00
2e2caadf7d
WIP: NVFP4 fused router kernel in raw CUDA C++ using DeepGEMM primitives
e3ea609ddd
Embed DeepGEMM source (not submodule) for SM100 raw CUDA GEMM primitives
dae83723a3
Add DeepGEMM as third-party dependency for SM100 raw CUDA GEMM primitives
Compare 3 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:29:32 +00:00
ef4c0ad489
Fix BF16 router mma_tiler: use cutlass.Int32 for CuTe DSL compatibility
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:20:55 +00:00
79be9cb8da
Fix: hardcode mma_inst_shape_k=32 for NVFP4 (avoids MLIR unpack error in JIT)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:19:21 +00:00
c3a64ceed7
Fix: mma_tiler must use CuTe Ints for static layout construction
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:16:49 +00:00
39b481e52b
Ensure mma_tiler contains CuTe Ints for cute.slice_ compatibility
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:14:48 +00:00
57cc20d5ad
Fix SFA/SFB SMEM: blockscaled layouts are plain Layout (no .outer/.inner swizzle)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:13:35 +00:00
fcd7680583
Fix CuTe tensor creation: use from_dlpack + mark_layout_dynamic
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:11:44 +00:00
3a8c6daeb3
Fix: cutlass_torch.make_tensor -> as_tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:10:59 +00:00
0553117af6
Simplify fused router test: compare fused vs 2-kernel NVFP4 path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:08:58 +00:00
44a0e59808
Fix fused router test: use quantize_weight_to_nvfp4 (correct function name)
biondizzle
pushed tag
v-nvfp4-fused-router-rewrite-20260601-0715
to
biondizzle/nvfp4-megamoe-kernel
2026-06-01 07:08:18 +00:00
First
Previous
...
16
17
18
19
20
...
Next
Last