docs: document M_for_layout=128 assumption in _prepack_weight_sf
SFB layout size may depend on M. Currently unverified — only tested with M=128. Added TODO to test with M=1 and M=256.
This commit is contained in:
@@ -112,8 +112,11 @@ def _prepack_weight_sf(weight_sf, N, K, tag):
|
||||
from nvfp4_megamoe_kernel.cutlass_nvfp4_gemm.kernel import prepack_sfb
|
||||
|
||||
E = weight_sf.shape[0]
|
||||
# M for layout sizing. Test with different M to confirm SFB is M-independent.
|
||||
# If SFB size changes with M, bucket by M and cache per-bucket.
|
||||
# M_for_layout controls CUTLASS SFB layout sizing.
|
||||
# ASSUMPTION: SFB layout size is M-independent (CUTLASS tiling is over M
|
||||
# but the scale factor block structure depends on N,K only). If this is
|
||||
# wrong, we need to prepack per-expert with actual M. Verified only for
|
||||
# M=128 — TODO: test with M=1, M=256 to confirm.
|
||||
M_for_layout = 128
|
||||
|
||||
packed = []
|
||||
|
||||
Reference in New Issue
Block a user