DeepGEMM

Files

biondizzle 49e5646b42 fix: remove duplicate kInt8 case — kPackedFP4 is already kInt8

kPackedFP4 = torch::kInt8, so the kInt8 case was a duplicate.
The real fix was in mega_nvfp4.hpp: changing kUInt8→kInt8 so
tensors match the existing kPackedFP4 path in the TMA switch.

2026-05-11 22:55:28 +00:00

epilogue.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

runtime_utils.hpp

fix: remove duplicate kInt8 case — kPackedFP4 is already kInt8

2026-05-11 22:55:28 +00:00

sm90_bf16_gemm.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm90_bmk_bnk_mn.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm90_fp8_gemm_1d1d.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm90_fp8_gemm_1d2d.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm90_tf32_hc_prenorm_gemm.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm100_bf16_gemm.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm100_bmk_bnk_mn.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm100_fp8_fp4_gemm_1d1d.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

sm100_fp8_fp4_mega_moe.hpp

Add various optimizations and Mega MoE benchmarks (#316 )

2026-04-24 18:41:37 +08:00

sm100_fp8_gemm_1d1d.hpp

Multiple updates and refactorings (#280 )

2026-01-16 17:06:52 +08:00

sm100_fp8_nvfp4_mega_moe.hpp

fix: packed FP4 for mxf4nvf4 — correct SMEM layout, UMMA descriptors, L1 epilogue

2026-05-11 21:59:21 +00:00

sm100_tf32_hc_prenorm_gemm.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

smxx_clean_logits.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

smxx_cublaslt.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

smxx_fp8_fp4_mqa_logits.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

smxx_fp8_fp4_paged_mqa_logits.hpp

Add various optimizations and Mega MoE benchmarks (#316 )

2026-04-24 18:41:37 +08:00

smxx_fp8_mqa_logits.hpp

Multiple updates and refactorings (#231 )

2025-11-21 17:49:47 +08:00

smxx_fp8_paged_mqa_logits.hpp

Multiple updates and refactorings (#280 )

2026-01-16 17:06:52 +08:00

smxx_layout.hpp

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00