DeepGEMM

Files

biondizzle 86a1263f44 fix: gran_k=16 in transform_sf + sm_100a arch for NVFP4 mega_moe

- transform_sf_into_required_layout: add gran_k=16 branch for NVFP4 UE4M3
  scales (4 per int32, group_size=16). Previously only handled 32/128.
- get_arch: always return '100a' for SM100, never '100f'. The family
  variant lacks mxf4nvf4 (NVFP4 block-scaled MMA) support, causing
  'scale_vec::4X not supported on sm_100f' errors.
- transform_nvfp4_weights_for_mega_moe: fold weight_scale_2 into block
  scales, pack UE4M3→int32, transpose MN-major, call
  transform_sf_into_required_layout with gran_k=16.

2026-05-11 16:11:11 +00:00

cache.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

compiler.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

device_runtime.hpp

fix: gran_k=16 in transform_sf + sm_100a arch for NVFP4 mega_moe

2026-05-11 16:11:11 +00:00

handle.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )