biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:37:36 +00:00
07bf2adf51 D1.2: TMEM budget probe with real tensor major modes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:36:41 +00:00
6e351c276d fix: OperandMajorMode.MN not .M
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:35:56 +00:00
cabe8489aa fix: typo + OperandMajorMode for TMEM budget probe
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:35:04 +00:00
61b9dbb2d6 fix: LayoutEnum import from cutlass.utils
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:34:31 +00:00
4c35fa49a9 fix import path for tcgen05
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:33:28 +00:00
a2d0dec7bb D1.2: TMEM budget probe script for hd=64,128,256,512
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:32:55 +00:00
578d186c20 fix: add SwiGLU clamping to fused kernel (paper §4.2.3, CG-1)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:31:40 +00:00
11c7e2c663 STAGE_D.md: restructure with correctness gaps, TMEM budget, execution order
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:04:46 +00:00
3d69215c4e D1.1: Fix make_fragment_A — use sP for SMEM source pv_mma
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:03:29 +00:00
d0567524e1 D1.1: Fix PV A-operand construction — compile-time branch for TMEM vs SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:01:03 +00:00
a3344ddd50 D1.1: Add SMEM-P path behind use_smem_p flag (stub: zero sP)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:55:04 +00:00
27041964e3 D1.0: Replace HEAD_DIM=64 with self.head_dim constructor parameter
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:52:05 +00:00
e98f5e4f9e Add STAGE_D.md: step-by-step runbook and todo list for D1-D5
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:46:00 +00:00
0520d55ca6 Rename FmhaV3StageC → FmhaKernel — no dev stage artifacts in production API
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:42:44 +00:00
af925abe3b Update README: reflect Stage C migration, built indexer/router/compressor, SMEM-P path, CuTeDSL scoping lesson
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:36:23 +00:00
c92976b3cd Migrate Stage C kernel (proven cos 0.97) into module - exact copy, no modifications
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:18:40 +00:00
e397386ba2 Fix TMEM-P offset calc: match Stage C with p_cols_fp32 from pv_mma_tiler[2]
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:17:46 +00:00
a284580422 Add missing TMEM fence after P store in TMEM-P path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:16:21 +00:00
0cd0e8b35f Fix p_cols_fp32: use pv_mma_tiler[2] (K-dim) not [1] (N-dim)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 05:14:10 +00:00
721bac4958 Fix PV A-operand major mode: K for TMEM-P, a_major for SMEM-P