This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:27:37 +00:00
f278348f44
D3: SWA mask with BF16 min pre-masking approach (K[invalid]=BF16_MIN → scores≈-inf)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:23:04 +00:00
cfbeb9c454
D3: SWA mask test with zero-masking approach (pre-mask K/V in Python)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:20:55 +00:00
68cb0236b5
D3: add SWA sequence length mask test (reference oracle + full-window regression)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:18:51 +00:00
7f69979c5f
D1.5: add multi-KV-tile attention test with Python KV merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:16:06 +00:00
8f35b75164
D2: comprehensive head-packed test (n_h=1, 64, 128, hd=64, 128)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:15:09 +00:00
dbe2ecbd41
D2: add num_query_heads/batch_size params + batch grid dimension
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:13:41 +00:00
7c6fdd151d
fix: use reference attn_sum for normalization (kernel LSE per-row may be wrong)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:12:01 +00:00
673825c242
rewrite D2 regression test: match existing Stage D1 test pattern with cute.compile + PV tiles
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:06:23 +00:00
06cb800242
fix regression test: use normalize=False + external LSE normalization
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:04:32 +00:00
13b5afc471
fully revert FmhaKernel changes to debug regression
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:03:21 +00:00
0b9f9da2f7
revert grid change to debug regression
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 17:00:59 +00:00
aa66f44ff9
add n_h=1 regression test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:54:59 +00:00
efdedab399
fix tests: use 3D tensors (M, hd, 1) matching kernel local_tile expectations
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:53:19 +00:00
a4499f5aa8
fix tests: pad Q to 128 rows (M tile size) for all configs
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:51:54 +00:00
af136eee27
fix: use CUstream instead of cuStream(0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:50:51 +00:00
4826fa6afb
D2: add num_query_heads/batch_size params + head-packed test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:42:04 +00:00
d53e0a33a9
NVFP4-3: add use_2cta_instrs conditional to gemm_runner
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:25:06 +00:00
22a2fc563e
cleanup: remove diagnostic test file
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:24:13 +00:00
a064b99d3d
fix test 4: use silu(gate)+swiglu interleaved (matching fused kernel output)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-25 16:23:04 +00:00
e76ea36337
fix test: use proper global_scale from quantize_to_nvfp4 for larger shape test
First
Previous
...
63
64
65
66
67
...
Next
Last