Files
DeepGEMM/deep_gemm/include
biondizzle af092fa7ba fix: double SMEM SF allocation for NVFP4 group=16 + clean stale comments
- SMEM_SFA/SFB_SIZE_PER_STAGE doubled: group=16 needs 8 SFs per token
  per BLOCK_K=128 (vs 4 for group=32)
- arrive_and_expect_tx updated to use SMEM_SFA/SFB constants
- Removed stale comments about 8/16 TMEM columns
2026-05-11 23:58:07 +00:00
..