This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:37:36 +00:00
07bf2adf51
D1.2: TMEM budget probe with real tensor major modes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:36:41 +00:00
6e351c276d
fix: OperandMajorMode.MN not .M
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:35:56 +00:00
cabe8489aa
fix: typo + OperandMajorMode for TMEM budget probe
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:35:04 +00:00
61b9dbb2d6
fix: LayoutEnum import from cutlass.utils
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:34:31 +00:00
4c35fa49a9
fix import path for tcgen05
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:33:28 +00:00
a2d0dec7bb
D1.2: TMEM budget probe script for hd=64,128,256,512
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:32:55 +00:00
578d186c20
fix: add SwiGLU clamping to fused kernel (paper §4.2.3, CG-1)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:31:40 +00:00
11c7e2c663
STAGE_D.md: restructure with correctness gaps, TMEM budget, execution order
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:04:46 +00:00
3d69215c4e
D1.1: Fix make_fragment_A — use sP for SMEM source pv_mma
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:03:29 +00:00
d0567524e1
D1.1: Fix PV A-operand construction — compile-time branch for TMEM vs SMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 06:01:03 +00:00
a3344ddd50
D1.1: Add SMEM-P path behind use_smem_p flag (stub: zero sP)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:55:04 +00:00
27041964e3
D1.0: Replace HEAD_DIM=64 with self.head_dim constructor parameter
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:52:05 +00:00
e98f5e4f9e
Add STAGE_D.md: step-by-step runbook and todo list for D1-D5
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:46:00 +00:00
0520d55ca6
Rename FmhaV3StageC → FmhaKernel — no dev stage artifacts in production API
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:42:44 +00:00
af925abe3b
Update README: reflect Stage C migration, built indexer/router/compressor, SMEM-P path, CuTeDSL scoping lesson
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:36:23 +00:00
c92976b3cd
Migrate Stage C kernel (proven cos 0.97) into module - exact copy, no modifications
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:18:40 +00:00
e397386ba2
Fix TMEM-P offset calc: match Stage C with p_cols_fp32 from pv_mma_tiler[2]
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:17:46 +00:00
a284580422
Add missing TMEM fence after P store in TMEM-P path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:16:21 +00:00
0cd0e8b35f
Fix p_cols_fp32: use pv_mma_tiler[2] (K-dim) not [1] (N-dim)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:14:10 +00:00
721bac4958
Fix PV A-operand major mode: K for TMEM-P, a_major for SMEM-P
First
Previous
...
81
82
83
84
85
...
Next
Last