This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 17:55:13 +00:00
755f9ad567
debug: fix per_expert_alpha ref + clean up BF16 reference scaling
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 17:02:42 +00:00
de8acc7965
debug: dump raw GEMM inputs + first 8 output values
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 15:49:01 +00:00
9159cb6bb3
docs: add debug log — current state, hypotheses, fixes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 15:46:18 +00:00
2fd55a94c6
fix: weight reshape bug + igs double-count in BF16 reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 14:16:25 +00:00
c421a668f3
debug: BF16 reference GEMM + cosine comparison for L1
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 13:41:10 +00:00
995589ac8a
debug: add FP4 quantization round-trip diagnostic
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 13:22:00 +00:00
d0ed3d84a8
debug: add L2, SiLU, and scatter pipeline prints
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 12:50:16 +00:00
da5572f497
clean: remove diagnostic scripts from repo
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 12:42:56 +00:00
fd59222fc0
fix: stop folding global scale into float8 block scales
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 11:44:39 +00:00
56e62e916d
revert: idx2crd remap approach — source-first needs hierarchical coords
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 11:39:58 +00:00
d5949a23b4
fix: use cute::crd2idx for SF remap — layout_sf() not directly callable
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 11:38:19 +00:00
9908fd64d9
feat: CUTLASS NVFP4 mega_moe kernel — slot-based L1/L2, source-first SF remap
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 11:32:16 +00:00
a37a155bae
WIP: remove prepack cache, remap SFB per-call inside CUTLASS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 11:27:16 +00:00
19924275bc
WIP: remove prepack cache, remap SFB per-call inside CUTLASS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:56:32 +00:00
74a4475e5b
WIP: remove prepack cache, remap SFB per-call inside CUTLASS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:48:36 +00:00
4fed910c9c
WIP: remove prepack cache, remap SFB per-call inside CUTLASS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:38:47 +00:00
7adfaef113
fix: in-place prepack to avoid 2× peak memory
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:34:21 +00:00
90313f3a92
fix: LRU(2) eviction for prepack cache — prevents OOM across 61 layers
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:34:21 +00:00
5dc18df494
feat: MEGA_MOE_PREPACK_CACHE_MAX env var (default 2) with CUDA graph warning
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-15 10:14:32 +00:00
1da6726a86
fix: assert float8_e4m3fn dtype in _prepack_weight_sf
First
Previous
...
123
124
125
126
127
...
Next
Last