vllm/benchmarks/kernels at e30cedd44be332e1ddc7ec43b8a33bce532e7614 - vllm

Files

Mohammad Miadh Angkad d4f123cc48 [Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 )

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

2026-02-09 15:43:24 +00:00

cpu

…

deepgemm

…

bench_block_fp8_gemm.py

…

bench_fp8_gemm.py

…

bench_int8_gemm.py

…

bench_mxfp4_qutlass.py

…

bench_nvfp4_gemm.py

…

bench_nvfp4_quant.py

…

bench_nvfp4_qutlass.py

…

bench_per_token_quant_fp8.py

…

benchmark_2d_silu_mul_fp8_quant.py

…

benchmark_activation.py

…

benchmark_cutlass_moe_fp8.py

…

benchmark_cutlass_moe_nvfp4.py

…

benchmark_device_communicators.py

…

benchmark_fused_collective.py

[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 )

2026-02-09 15:43:24 +00:00

benchmark_fused_topk.py

…

benchmark_grouped_gemm_cutlass.py

…

benchmark_layernorm.py

…

benchmark_lora.py

…

benchmark_machete.py

…

benchmark_marlin.py

…

benchmark_mla_k_concat.py

…

benchmark_moe_align_block_size.py

…

benchmark_moe_permute_unpermute.py

[Refactor] Remove align block size logic in moe_permute (#33449 )

2026-02-06 10:57:06 -08:00

benchmark_moe.py

[Model] GLM adaptation (#34124 )

2026-02-09 17:32:52 +08:00

benchmark_mrope.py

…

benchmark_paged_attention.py

…

benchmark_per_token_group_quant.py

…

benchmark_quant.py

…

benchmark_reshape_and_cache_flash.py

…

benchmark_reshape_and_cache.py

…

benchmark_rmsnorm.py

…

benchmark_rope.py

…

benchmark_shapes.py

…

benchmark_silu_mul_fp8_quant.py

…

benchmark_trtllm_decode_attention.py

…

benchmark_trtllm_prefill_attention.py

…

benchmark_w8a8_block_fp8.py

…

graph_machete_bench.py

…

requirements.txt

…

utils.py

…

weight_shapes.py

…