This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 08:07:44 +00:00
72bf750a0b
fix: revert to eager mode — CUDA graphs OOM with 175GB model
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 07:49:41 +00:00
baf44c92f8
fix: memory-efficient E2M1 quantization — no 32x distance tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 07:32:19 +00:00
a2cac7a7fe
fix: remove CuTeDSL warmup — OOM with 175GB model loaded
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 07:16:00 +00:00
e0814eb54e
fix: cast expert_offsets to int32 for CuTeDSL kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 07:03:10 +00:00
4b0a9557f0
fix: rewrite CuTeDSLMoERunner for CUDA graph compatibility
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:31:17 +00:00
dab31b0961
fix: missing tqdm import in weight_loader
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:28:19 +00:00
8496ac99bc
dang clonkurs
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:14:31 +00:00
e7c6274107
Revert "feat: auto-warmup in build_and_run.sh"
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:11:40 +00:00
f792537719
feat: auto-warmup in build_and_run.sh
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:09:23 +00:00
5d975d00d9
feat: tqdm progress bar for expert weight loading
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 06:02:14 +00:00
2e4ff6b8d4
fix: increase vLLM RPC timeout to 10 min for first-request JIT
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:51:36 +00:00
a569612df5
feat: add load progress heartbeats to prevent k8s health check kills
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:43:35 +00:00
e5370140cb
docs: update README with full NVFP4 coverage, dequant anti-pattern, v2 status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:36:36 +00:00
3445bd24c1
feat: keep attention weights native NVFP4 — stop dequantizing to BF16
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:21:12 +00:00
4d4cfa6b28
fix: tqdm over MoE layer warmup, compile every layer, no print spam
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:18:16 +00:00
3838561c19
fix: only suppress compile message, still warmup all layers
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:16:54 +00:00
f19932d8db
fix: compile CuTeDSL kernel once per process, not per MoE layer
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 05:01:22 +00:00
936982c5aa
fix: add layer-level tqdm for expert finalization, remove inner expert tqdm
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 04:56:47 +00:00
cf0731cf4b
fix: warmup with 128 tokens (fills MMA tile), better error handling
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 04:40:32 +00:00
a70d2d3984
fix: clearer warmup message — 'Compiling CuTeDSL NVFP4 MegaMoE kernel'
First
Previous
...
118
119
120
121
122
...
Next
Last