This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 17:02:37 +00:00
e07d79868f
CUDA graph: Fix _assemble_scales_single_group swizzle size
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 16:52:31 +00:00
0ca7bed0e1
CUDA graph: Fix sync violations found by B200 detector
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 16:38:36 +00:00
46a3a51832
CUDA graph: Fix per-step allocations in decode loop
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 16:37:23 +00:00
a9ea30353c
CUDA graph: Fix sync violations (Category 1-2)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 16:34:35 +00:00
caac8ae108
Fix syntax error: 'is not not None' -> 'is not None'
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 16:34:18 +00:00
ba68212fa7
Add CUDA graph readiness detector (Section A of GETTING_CUDAGRAPH_READY.md)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 15:52:02 +00:00
ca5bc814d5
Fix compressor: do not add positional bias to KV content
biondizzle
pushed tag
v-precision-floor-fix-20260603
to
biondizzle/nvfp4-megamoe-kernel
2026-06-03 15:51:43 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 15:45:22 +00:00
4fe73fe713
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:57:53 +00:00
f577ed97f4
Fix: Use PyTorch dequant_nvfp4 for weight dequantization (compressor/indexer/router gate)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:48:54 +00:00
1121cd7b47
Add CUDA_LAUNCH_BLOCKING=1 to catch async errors
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:38:26 +00:00
f3bb0ca08c
Fix dequant gsa: use ws2 only, NOT input_scale * ws2
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:27:01 +00:00
470e65fb19
Fix dequant gsb: input_scale * ws2, not 1.0 * ws2
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:19:44 +00:00
2dd16d5789
Switch compressor + indexer weights_proj to BF16 F.linear
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:17:09 +00:00
95e45a87e3
Add explicit .to(dev) on W_gate after transpose — belt and suspenders
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:14:13 +00:00
ef94c48957
Simplify router gate: dequant NVFP4 → BF16, F.linear (no FP8 middleman)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:10:30 +00:00
715602c87c
Switch lm_head to BF16 + router gate to FP8_E4M3
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 14:00:37 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 13:57:17 +00:00
89510601f5
Revert compressor pos bias fix + SwiGLU clamp ordering from commit
3320abf
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 13:48:46 +00:00
f05ee6cd69
Revert SE BF16 fallback — produced garbage output
First
Previous
...
2
3
4
5
6
...
Next
Last