This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 17:24:27 +00:00
3f2f4e1882
Fix cudaErrorStreamCaptureUnsupported: no dynamic GPU-tensor slicing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:59:52 +00:00
11b5aa5e37
Scale assembly: full-buffer swizzle, zero CPU syncs, no Python loops
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:56:53 +00:00
94dec5922d
Scale assembly Phase 2: use CPU-computed offsets for Python slicing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:55:48 +00:00
49c28e6562
Fix: use real padded expert offsets instead of fixed layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:52:43 +00:00
87a223f1ac
Update CURRENT_BUG.md: current status, outstanding garbage output issue, hypotheses
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:25:41 +00:00
c03438fc4e
crap shoot
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:06:59 +00:00
7c16f3cb46
Fix: init shared dict before using it, remove duplicate _output_buf
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 16:05:56 +00:00
ea8acf9852
Share padded_x_sf and output buffers across layers to save ~300 MB
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 15:52:08 +00:00
3d0b1408b4
Update CURRENT_BUG.md: Bug 21 (shared buffers), clean up status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 15:47:39 +00:00
455ecb5631
Fix: define padded_max_slots before using it in shared buffer allocation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 15:47:12 +00:00
b1ac74bb4d
Fix shape mismatch: shared padded buffers, revert max_num_tokens cap
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 15:46:15 +00:00
e2f33596a2
Update CURRENT_BUG.md: status through Bug 20, fixed-layout padding architecture
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 15:18:09 +00:00
faf7c8cc51
Debug: print runner max_num_tokens and max_chunks
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 14:59:46 +00:00
c5af1aba6b
Fix OOB: size padded buffers for num_experts*max_chunks*128
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 14:14:16 +00:00
8ac8e20fa9
Fix OOM: cap buffer pre-allocation at cudagraph max capture size
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 14:02:12 +00:00
5bb78564f5
Remove dynamic tensor allocation in scale assembly (cudagraph fix)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 13:59:01 +00:00
8c31e78359
Fix cudagraph: fully fixed-layout per-expert sections, no GPU scalars in Python control flow
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 13:56:54 +00:00
ff74b33d2c
Fix cudagraph: static loop for per-expert scale swizzle
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 13:55:12 +00:00
bf22b6f0e4
Fix scale assembly: variable-size per-expert padding matching GEMM offsets
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-17 13:32:43 +00:00
0d3c928ff2
Update CURRENT_BUG.md: full status through Bug 14, vLLM integration status, architecture docs
First
Previous
...
112
113
114
115
116
...
Next
Last