Commit Graph

6 Commits

Author SHA1 Message Date
e608a20dec docs: major README update — packed FP4 SMEM layout, L1 epilogue, TMA descriptors
Added detailed documentation of the packed FP4 architecture:
- mxf4nvf4 reads packed (2 per byte), NOT unpacked like mxf8f6f4
- SMEM layout: float_e2m1_t, BLOCK_K/2 swizzle, UMMA desc byte math
- L1 epilogue: st.shared.u16, no swizzle, kWarpBytesPerRow
- Host TMA: hidden/2 K-dim, block_k/2 inner, fp4_unpacked_smem=false
- Build history through Build 35
2026-05-11 22:40:09 +00:00
e80fe9af60 docs: CORRECTED — mxf4nvf4 IS supported on sm_100a (B200)
The build 17-18 'scale_vec not supported on sm_100f' error was because
we targeted sm_100 instead of sm_100a. The 'a' suffix is required for
FP4 block-scaled MMA instructions. Reverting to mxf4nvf4 with correct
arch target is the path forward.
2026-05-11 14:24:55 +00:00
c2f4a30780 docs: comprehensive README update through build 22 2026-05-11 13:55:17 +00:00
f44ff7f6ca docs: document SM100 hardware constraint and full debugging log 2026-05-11 09:30:44 +00:00
acbe006498 docs: update debugging log in README 2026-05-11 07:33:02 +00:00
42c215d49b docs: add NVFP4 mega MoE kernel README 2026-05-11 05:41:25 +00:00