Files
DeepGEMM/csrc
biondizzle dcebe033e2 fix: use scale_vec::2X (block32) for SM100 B200 compatibility
scale_vec::4X (block16) requires SM103/SM120 (B300/GB300), not SM100 (B200).
Revert to block32 with UE4M3 scales. Same TMEM layout as MXFP4 but with
UE4M3 scale format instead of UE8M0.

Changes:
- kGranK: 16 → 32
- PTX: scale_vec::4X → scale_vec::2X
- SF layout: same as MXFP4 (K/32, K/128 for int32 packed)
- UTCCP: i*8 → i*4 (2X layout, same as MXFP4)
- TMEM columns: same as MXFP4 (SF_BLOCK_M/32, SF_BLOCK_N/32)
- Python: merge NVFP4 block16→block32 scales (max of adjacent pairs)
- recipe: (1,1,16) → (1,1,32)
2026-05-11 08:36:59 +00:00
..