This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:46:31 +00:00
e231b98387
Fix mHC Sinkhorn test: row sums expected to be off (eps after softmax)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:45:04 +00:00
b5f29be169
Add mHC Sinkhorn CUDA kernel test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:44:57 +00:00
6cb5078821
Fix mHC Sinkhorn kernel: remove VLA, remove Python fallback
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:20:41 +00:00
c89762ecdd
Fix set_indexer_keys_fp8 None guard + store comp_pos in mixed storage
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:19:55 +00:00
1f69f61363
Add detailed comment: why compressed KV uses FP8 not NVFP4
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:08:46 +00:00
edc8e7ee8d
KV-1/KV-2: Mixed FP8+BF16 compressed KV (DeepSeek V4 paper format)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:04:02 +00:00
12b6365b42
Fix RoPE test: use proper cos/sin cache
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:02:10 +00:00
f566b9b748
Fix FP8 quantize return type (2-tuple not 3)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:01:10 +00:00
bdb25ee5cd
Add production-value unit tests for kv_quantize kernels
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 10:00:59 +00:00
7ef6402936
KV-1/KV-2/KV-3: NVFP4 compressed KV + FP8 indexer keys
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:49:13 +00:00
40dd56eac2
KV-1: Fix shared memory corruption in block_reduce
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:46:34 +00:00
0fefadedd4
KV-1: Fix FP8 round-trip mismatch in fused quantize
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:44:03 +00:00
d74ff5768d
KV diag test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:41:18 +00:00
c2664281c3
KV-1/KV-2: Fix quantize kernel — each thread handles 16-elem blocks independently
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:37:56 +00:00
f23320b5b2
KV-1/KV-2: Fused compress+NVFP4 quantize kernels + dequant
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:30:07 +00:00
107d62dd76
docs: update PERFORMANCE_AUDIT.md — Part 1 (P0-P3) landed, Part 2 KV cache next
biondizzle
pushed tag
v-p0p1p2p3-fused-swiglu-cuda-rope-20260602
to
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:22:22 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:08:10 +00:00
3c295f225a
P3: integrate CUDA RoPE kernel into single_shot — 732 launches/token eliminated
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:06:39 +00:00
54a9b6961b
fix: rope_cuda path — kernels/cuda not ops/cuda
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-02 09:05:24 +00:00
2bbbead984
P3: CUDA RoPE kernel — single launch per call (vs 5-6 PyTorch ops)
First
Previous
...
9
10
11
12
13
...
Next
Last