vllm/csrc/moe at b197a5ccfdaa46b6750feb5efa4c5d8bf4030d44 - vllm

Files

Jinzhen Lin 750f4cabfa [Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 )

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

2025-01-20 16:42:16 -08:00

marlin_kernels

[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 )

2024-10-04 12:34:44 -06:00

marlin_moe_ops.cu

[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )

2024-10-17 19:08:34 +00:00

moe_align_sum_kernels.cu

[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 )

2025-01-20 16:42:16 -08:00

moe_ops.h

[Performance][Kernel] Fused_moe Performance Improvement (#9384 )

2024-10-24 15:37:52 -07:00

topk_softmax_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

torch_bindings.cpp

[Performance][Kernel] Fused_moe Performance Improvement (#9384 )

2024-10-24 15:37:52 -07:00