vllm/vllm/model_executor at 236de72e49d94451e1b7821736a11a80f7efda5d - vllm

Files

Dimitrios Bariamis 367cf5cd3e [Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931 )

Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

2026-03-13 16:41:16 -07:00

kernels

[Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776 )

2026-03-11 13:37:46 +00:00

layers

[Bugfix] Fix MLA attention crash with AWQ/GPTQ quantized models (#34695 )

2026-03-13 23:25:41 +00:00

model_loader

[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 )

2026-03-12 07:57:47 -07:00

models

[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931 )

2026-03-13 16:41:16 -07:00

offloader

[UX] Remove NoOpOffloader log (#35678 )

2026-03-04 12:13:40 -08:00

warmup

[Feature]: Remove Chunking From FusedMoE (#34086 )

2026-03-12 14:24:38 -04:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels (#36605 )

2026-03-12 03:28:23 -07:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00