This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 22:43:48 +00:00
5dcfb333ea
Fix: move weight tensors to CUDA before dequant
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 22:43:03 +00:00
47c7b3c50b
Fix: ensure FP4 LUT on CUDA before index op
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 22:42:20 +00:00
13bae9dd55
Fix single_shot: mHC replaces layernorm, no hidden-level norm in DSV4
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 22:41:09 +00:00
e8334fc4af
Rewrite single_shot_inference.py — complete forward pass
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 22:39:03 +00:00
9b0858aa35
Add single_shot_inference.py — baseline kernel verification
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:22:36 +00:00
4472928506
E3: model construction test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:21:49 +00:00
afc07a5d1a
Update STATUS.md: E5 done
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:21:04 +00:00
df6220abaf
E5: Fold batch loop into native kernel grid (blockIdx.z)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:20:11 +00:00
e162a2d112
Update STATUS.md: E1-E4 done
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:19:30 +00:00
c4b40dd06c
E2: CSA/HCA integration test — gather + FMHA end-to-end
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:19:06 +00:00
9d88769f5f
Wire indexer compute_index_scores_topk + fix compressor imports
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:16:59 +00:00
daf84524ac
E2/E3: compressor bridge, indexer bridge, flush pipeline wiring
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:15:58 +00:00
d3b772196d
E3: Implement DSV4Model — full model class
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:14:17 +00:00
b0cdd5af74
fix: extern declarations for gather_swa functions in gather_kv.cu
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:13:28 +00:00
016d722abc
fix: single PYBIND11_MODULE for combined gather .so
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:12:17 +00:00
8fb9d89658
fix: correct gather.py kernel_dir path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:11:07 +00:00
924707a673
fix: add FFNType/RouterMode to LayerSpec in e2e test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:10:46 +00:00
e2e21c6350
fix: remove unused pytest import from e2e test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:10:27 +00:00
300dddedc0
E1-E4: gather kernels, handle wiring, rope, sync removal, e2e test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 21:09:24 +00:00
faf92b30ad
E1: Wire LayerCacheHandle gather methods + CUDA gather kernels
First
Previous
...
28
29
30
31
32
...
Next
Last