This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:45:54 +00:00
8ee3f90e44
debug: handle flat_rank=8 for SF remap, add coordinate dump
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:34:48 +00:00
d2c1c76f5b
debug: idx2crd+flatten approach with printf to determine flat_rank
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:32:13 +00:00
2ac3a7d631
fix: construct nested coordinate for CuTe layout shape ((32,4), K)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:31:06 +00:00
593ae998f8
fix: clean rewrite of cutlass_nvfp4_gemm.cu — no more file splicing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:28:47 +00:00
196ee37fdb
fix: rewrite SF remap kernel — source-iterating with layout_sf(m, k_elem)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:24:19 +00:00
fb390b24e2
debug: add printf to SF remap kernel to check flat_rank and layout shape
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:04:44 +00:00
8f5322ca31
fix: add missing extern "C" opening brace lost during file reconstruction
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 15:01:49 +00:00
a8bd962452
fix: SF remap — iterate dest indices, extract logical (m, k_sf) from nested coord
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 14:57:15 +00:00
395cc31883
fix: use layout_sf(m, k_elem) instead of make_coord for nested shapes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 14:54:20 +00:00
d90967d6e9
fix: SF remap — element-space K coords + zero-init dest buffer
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 14:51:03 +00:00
5968ebad9f
fix: SF remap was using idx2crd+flatten which gives atom sub-indices, not logical (m,k)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 14:19:41 +00:00
cf796e37cf
debug: add weight_scale_2 shape/value logging in weight transform
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 14:17:44 +00:00
879adc324d
fix: _fold_global_scale — remove broken logical_widths branch
biondizzle
pushed to
modelopt-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-14 14:13:21 +00:00
f2656dcf6d
sync B200 deployment files: Dockerfile, docker-compose, patches
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 13:44:43 +00:00
ef9cd023a9
fix: unpack_ue4m3_u32 — uint32 lacks CUDA bitwise ops, use int32
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 13:05:49 +00:00
1c39e21d87
fix: remove broken L1 weight interleave
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 13:04:33 +00:00
80495c0cd6
docs: clarify SF layout remap is in CUDA, not sf_layout.py
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 13:02:12 +00:00
16f91ff0e1
fix: rewrite stage_activation with proper E2M1 quantization
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:59:23 +00:00
3bcc0ac057
fix: unpack_ue4m3_u32 was value-casting instead of bit-reinterpreting
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-14 12:48:09 +00:00
8b7fa0c91e
add README: pipeline diagram, file map, data formats, known issues
First
Previous
...
128
129
130
131
132
...
Next
Last