vllm/vllm/model_executor at 000214c4bb3f4fb61989eea19c625aedd0559ace - vllm

Files

Roberto L. Castro afdce12c89 [Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 )

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-10 10:29:52 -05:00

layers

[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 )

2026-02-10 10:29:52 -05:00

model_loader

[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190 )

2026-02-09 23:06:30 -08:00

models

[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008 )

2026-02-10 10:08:05 -05:00

warmup

[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 )

2026-02-07 05:24:44 -08:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806 )

2026-01-22 19:52:26 -08:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00