This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:21:55 +00:00
5290c91c35
fix quantize_nvfp4 kernel: use proven single-thread-per-CTA pattern from deinterleave_quantize.cu
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:20:31 +00:00
5508f29625
add GPU quantize diagnostic test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:19:08 +00:00
c2e3d15633
NVFP4-1.1 integration: GPU-only quantize kernel + MoE pipeline wiring
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 09:08:11 +00:00
6504f091ca
NVFP4-1.1 Step 3: post-SWiGLU quantization test suite (all PASS)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 08:58:30 +00:00
5e8347836f
NVFP4-1.1: working BF16→FP4 quantize kernel (cos 0.979)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 03:23:47 +00:00
52d11d7f92
NVFP4-1.1: standalone BF16→FP4 quantize kernel (WIP) + dequantize verification
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 03:17:15 +00:00
1f310defa0
fix: quantize_activation_nvfp4 returns 2 values, not 3
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 03:15:42 +00:00
6dac3bcaf0
NVFP4-1.1: add FP4 quantize round-trip test (step 1 of kernel fusion)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 03:07:55 +00:00
eb46e4d15e
NVFP4-0.2-0.4: add FP4 primitives diagnostic test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 02:40:15 +00:00
29ad36934d
cleanup: remove D2 diagnostic/experimental files, keep working codebase clean
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 02:36:48 +00:00
d5b69ac122
D2: simpler shape diagnostic using CuTe from Python (no kernel needed)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 02:34:32 +00:00
684e9a85fe
fix: use utils.sm100 instead of sm100 in diagnostic
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 02:33:23 +00:00
7599801f57
D2: add flat_divide shape diagnostic kernel for multi-CTA grid
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 01:18:48 +00:00
32850f6974
Update README, STAGE_D, STAGE_D2 with D1 rescale findings and D2 status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 01:09:31 +00:00
6cc151097e
Revert D2 multi-CTA attempts - keeping per-head launch approach (works correctly)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:44:40 +00:00
34f5beb767
D2: fix gC coordinate to match 5-mode flat_divide result
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:43:25 +00:00
a3559538cf
D2: try 6-mode coordinate for flat_divide result
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:42:06 +00:00
6f371d6b31
D2: add flat_divide shape print, try different coordinate order
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:40:39 +00:00
7007a9db79
D2: use flat_divide for runtime coordinate indexing (like CUTLASS)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:38:49 +00:00
3e340a0eee
D2: fix local_tile coordinate for 4D Q (2 rest modes, not 3)
First
Previous
...
64
65
66
67
68
...
Next
Last