docs: document M_for_layout=128 assumption in _prepack_weight_sf

SFB layout size may depend on M. Currently unverified — only tested
with M=128. Added TODO to test with M=1 and M=256.
This commit is contained in:
2026-05-15 10:13:19 +00:00
parent b7c7e9fb50
commit 489c620159

View File

@@ -112,8 +112,11 @@ def _prepack_weight_sf(weight_sf, N, K, tag):
from nvfp4_megamoe_kernel.cutlass_nvfp4_gemm.kernel import prepack_sfb
E = weight_sf.shape[0]
# M for layout sizing. Test with different M to confirm SFB is M-independent.
# If SFB size changes with M, bucket by M and cache per-bucket.
# M_for_layout controls CUTLASS SFB layout sizing.
# ASSUMPTION: SFB layout size is M-independent (CUTLASS tiling is over M
# but the scale factor block structure depends on N,K only). If this is
# wrong, we need to prepack per-expert with actual M. Verified only for
# M=128 — TODO: test with M=1, M=256 to confirm.
M_for_layout = 128
packed = []