This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 13:19:33 +00:00
bde81b95f4
Fix GEMM scale layout: pad to 128 tokens per expert
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 12:31:27 +00:00
7e692c3aec
Fix cudaErrorStreamCaptureUnsupported: pre-allocate all tensors used during capture
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 11:39:06 +00:00
b0221662e7
Fix warmup: pass local expert IDs (not global), remove incorrect _warmup_done guard
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 11:11:01 +00:00
b531a98f8f
Fix scale assembly: per-expert 128-row fixed slots, no dynamic sizing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 10:48:25 +00:00
04245b664b
Add warmup-based activation global scale computation in finalize_weights
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:59:59 +00:00
4445882ba7
Fix: return 2D scale tensor for GEMM (shape[1] access)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:59:14 +00:00
3cd910193c
Rewrite scale assembly: no .item() calls, no Python loops, fully GPU
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:58:10 +00:00
4f6217acb9
Fix padded_cols calculation in scale assembly
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:57:27 +00:00
918aa8aede
Fix scale assembly output shape: reshape to 2D for GEMM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:56:30 +00:00
d9bae6d770
Fix OOB in scale assembly: size padded_x_sf for max tokens, fix top_k/max_num_tokens passing, support variable-size expert blocks
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:39:44 +00:00
55ac60eb91
Add detailed debug prints for OOB investigation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:19:12 +00:00
fed3c417ba
Add debug OOB check for sorted_token_ids
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 09:01:27 +00:00
eb7d4f099b
Update CURRENT_BUG.md with Bug 8 (global→local expert ID) and Bug 8b (.cpu() sync)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:58:45 +00:00
ca3cba5bbd
Fix global→local expert ID remapping for EP and remove .cpu() sync
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:30:44 +00:00
1330e2b2cf
cleanup: remove debug prints, ready for testing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:29:20 +00:00
d635dcbbb6
fix: keep token_indices on CPU, index with CPU sort_idx
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:27:49 +00:00
235d5b314f
fix: fallback token indices allocation with verify+rebuild
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:25:26 +00:00
dd0b3fd4f9
debug: print sorted_token_ids in warmup
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:24:59 +00:00
04999d86cf
fix: add quantize_to_nvfp4 import
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 08:24:30 +00:00
33e28100ee
test: use runner's built-in warmup method
First
Previous
...
113
114
115
116
117
...
Next
Last