vllm/vllm/model_executor at 48ecb4438b2845f757edf228c5455ca6095938af - vllm

Files

Michael Goin 48ecb4438b [Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126 )

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

2025-09-19 14:06:49 -06:00

layers

[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126 )

2025-09-19 14:06:49 -06:00

model_loader

Move ModelConfig from config/__init__.py to config/model.py (#25252 )

2025-09-19 16:22:33 +00:00

models

[Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform (#24974 )

2025-09-19 18:32:27 +00:00

warmup

[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206 )

2025-09-18 14:26:28 -07:00

__init__.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

custom_op.py

[V0 deprecation] Deprecate V0 Neuron backend (#21159 )

2025-09-06 16:15:18 -07:00

parameter.py

[OOT] Support sync_model_loading for OOT (#25126 )

2025-09-19 05:41:53 +00:00

sampling_metadata.py

[Doc]: fix typos in Python comments (#24042 )

2025-09-01 19:07:45 -07:00

utils.py

[OOT] Support sync_model_loading for OOT (#25126 )

2025-09-19 05:41:53 +00:00