This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 04:13:43 +00:00
fae61d3ef7
Add c10/cuda/CUDAStream.h include for getCurrentCUDAStream
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 03:58:01 +00:00
ee86969f6c
Fix CUDA stream: use c10::cuda::getCurrentCUDAStream() directly in kernel launch
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 03:43:08 +00:00
e26c28a1ce
Fix CUDA stream API: getCurrentCUDAStream().stream()
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 03:29:13 +00:00
9b3917e248
Fix blackwell_swizzle.cu: add pybind11 bindings for torch extension loader
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 03:14:29 +00:00
5487a58df4
Fix NameError: add rows/cols variables to MoE swizzle
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 03:03:04 +00:00
a434545d12
Blackwell swizzle CUDA kernel for CUDA graph capture
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 02:42:02 +00:00
e7766254b7
Pre-allocate ALL GEMM output buffers for CUDA graph capture
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 02:06:20 +00:00
676a0448c0
CRITICAL FIX: _l1_out_buf was 2x too narrow — caused GPU memory corruption
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:49:14 +00:00
0890e578f4
DEBUG: print l1_out shape before gate/up split
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:38:26 +00:00
8546ed725f
DEBUG: check SE input magnitude
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:30:31 +00:00
26ecf96328
DEBUG: check intermediate magnitude before SE L2
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:27:28 +00:00
5303d6a82f
DEBUG: test copy_ with contiguous slice vs scalar assign for gsa
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:16:40 +00:00
ccbc713658
DEBUG: check gsa values and pinpoint exact failing operation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:05:49 +00:00
e77455c3ba
DEBUG: add sync inside quantize_nvfp4_gpu_fused to catch async errors
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 01:03:38 +00:00
55def5eef9
Restore A/B split + gsa scalar fix (error is pre-existing, not regression)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 00:53:12 +00:00
59eccd04ab
REVERT: test if cudaErrorInvalidValue is pre-existing or regression
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 00:41:36 +00:00
5e3ced0b60
DEBUG: isolate which kernel causes cudaErrorInvalidValue in SE L2 path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 00:30:33 +00:00
b314fde9b7
Fix gsa copy_ cudaErrorInvalidValue: replace view-based copy_ with scalar assignment
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 00:15:40 +00:00
993bb345d1
DEBUG: fix VERBOSE reference in shared_expert, always print L2 gsa debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-04 00:05:10 +00:00
f0f87df906
DEBUG: add sync + shape prints to shared_expert L2 gsa copy
First
Previous
1
2
3
4
5
...
Next
Last