This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:49:05 +00:00
0ecb98daee
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:48:52 +00:00
6f94925491
NVFP4-1.1: fix cute.math.fmax -> cute.arch.fmax (correct CuTeDSL API)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:46:47 +00:00
60790564f0
NVFP4-1.1: fix test - two-pass kernel, cute.arch.store confirmed on B200
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:42:39 +00:00
ca9f920414
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:42:26 +00:00
a41de129cb
NVFP4-1.1: fix test kernel - use cute.copy instead of cute.arch.store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:40:55 +00:00
3a78bdf570
NVFP4-1.1: add CuTeDSL kernel test for FP4 quantization
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 03:40:08 +00:00
80b6b79f9e
NVFP4-1.1: FP4 quantization primitives for CuTeDSL kernels
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 15:15:08 +00:00
b9f15c250f
Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 08:31:25 +00:00
2412a5431b
MQA/GQA: batch Q heads into kernel batch dim, shared K/V per KV group
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 07:12:03 +00:00
06a895ff99
Clean test suite for production attention (1/2/4 segments, multi-head)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 07:10:02 +00:00
778d9d4f4f
Compile with row_sums tensor so kernel writes per-row row_sums
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 07:07:53 +00:00
0736a04d9b
Fix KV merge: use NORMALIZED O (O_unnorm/row_sum) with LSE
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 07:04:41 +00:00
06e7f7ab48
Debug: print LSE values for 2-segment merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:58:30 +00:00
8f8d14c300
Match tensor slicing exactly to test_d1_kv_merge (2D slices, 3D unsqueeze)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:56:05 +00:00
6ee61717c0
Match tensor shapes from working test_d1_kv_merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:54:18 +00:00
3a25c7feff
Test multi-KV merge (2 segments) separately from multi-head
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:52:45 +00:00
36a6f07a7e
Fix: unsqueeze k/v when dim==2
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:51:21 +00:00
fc4172937c
Clean production wrapper: always normalize=False + KV merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:49:03 +00:00
8f87109f86
Single-segment: use normalize=False + per-row normalization from row_sums
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:46:31 +00:00
fe55bf23a0
Split single-segment (normalized) and multi-segment (KV merge) paths
First
Previous
...
57
58
59
60
61
...
Next
Last