This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
deleted branch main from
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:46:38 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:44:49 +00:00
d3f35c9465
cleanup: remove abandoned TileLang and Mojo files
802c4ee12c
Revert stage_activation to simple quantize (staging kernel API incompatible with L1 output dims)
69e0174792
Fix stage_activation: use Triton staging kernel instead of broken simple quantize
c016e66e23
Add CUDA sync + NaN/Inf check after each expert GEMM in grouped kernel
1dfe5ffd05
Add comprehensive README documenting quirks, pitfalls, and setup
Compare 24 commits »
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:14:03 +00:00
802c4ee12c
Revert stage_activation to simple quantize (staging kernel API incompatible with L1 output dims)
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:01:35 +00:00
69e0174792
Fix stage_activation: use Triton staging kernel instead of broken simple quantize
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 11:27:59 +00:00
c016e66e23
Add CUDA sync + NaN/Inf check after each expert GEMM in grouped kernel
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 11:23:34 +00:00
1dfe5ffd05
Add comprehensive README documenting quirks, pitfalls, and setup
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:50:28 +00:00
904fc37ad8
Fix: use idx2crd instead of get_coord for CuTe layout coordinate lookup
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:48:58 +00:00
494d30b6ab
Fix: use CuTe get_coord for proper scale factor remap to CUTLASS interleaved layout
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:37:02 +00:00
869151d211
Fix kernel.py: remove broken expand on scale factors (was expanding sf to weight size)
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:23:04 +00:00
84becfac93
Test: pass scales directly to CUTLASS (no remap) to diagnose layout issue
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:21:11 +00:00
a272bc49b0
Fix: torch::kBFloat16
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:20:01 +00:00
3f62e49e6e
Fix PyTorch API: use c10::cuda and at::kBF16
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:18:38 +00:00
2ee4e26772
Fix: remove compile-time SM100 guard from pytorch binding, use runtime check instead
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 10:05:43 +00:00
540e68593f
Add scale factor remap kernel: remap simple row-major SFs to CUTLASS interleaved layout
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 09:50:55 +00:00
2998c889e7
Implement simple FP4 quantization for L1→L2 re-quant step (no vLLM fp4_utils dependency)
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 09:40:18 +00:00
98913c9b1a
Fix stage_activation: use Triton staging kernel from vLLM patch instead of fp4_utils
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 09:26:04 +00:00
25cbc85afe
Replace kernel.py with thin wrapper around pre-compiled _C extension
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-13 23:28:04 +00:00
33e5d67326
Add CUTLASS_CHECK macro
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-13 23:27:16 +00:00
b7c5cba407
Fix device_memory include path
biondizzle
pushed to
main
at
biondizzle/nvfp4-megamoe-kernel
2026-05-13 23:26:22 +00:00
3299d22ad6
Fix type casts and includes for CUTLASS NVFP4 GEMM
First
Previous
...
129
130
131
132
133
...
Next
Last