vllm/vllm/model_executor at 8d9babd4dea934fdd47b5a20a73ef0e04ff0e22e - vllm

Files

Wenlong Wang 847a57cd12 [Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673 )

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

2026-02-18 13:03:24 -08:00

layers

[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673 )

2026-02-18 13:03:24 -08:00

model_loader

[Quantization] - Added uses_meta_device_weights to quant config (#34645 )

2026-02-17 23:43:44 -08:00

models

[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697 )

2026-02-18 09:46:53 -08:00

warmup

[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 )

2026-02-07 05:24:44 -08:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806 )

2026-01-22 19:52:26 -08:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00