Commit Graph

14128 Commits

Author SHA1 Message Date
Andreas Karatzas
5f68464f92 [ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-23 05:05:54 -08:00
Vincent Gimenes
aa08a30fc9 [CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig (#35060)
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
2026-02-23 05:05:36 -08:00
Wentao Ye
7f40e9e516 [Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item (#35068)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-02-23 05:05:20 -08:00
Harry Mellor
103e614b14 Fix pipeline parallel with embed scaling in the Transformers modelling backend (#35094)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-23 05:04:47 -08:00
Neil Schemenauer
54e2f83d0a [Feature] Lazy import for the "mistral" tokenizer module. (#34651)
Signed-off-by: Neil Schemenauer <nas@arctrix.com>
2026-02-23 00:43:01 -08:00
Gabe Goodhart
e631f8e78e fix: Apply embedding_multiplier to inputs_embeds (#34813)
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-23 00:42:46 -08:00
Martin Hickey
e97c46a92d [BugFix]: Fix local mypy issues (#34739)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-23 00:40:29 -08:00
Jee Jee Li
7291d1b288 [Bugfix] Fix kernel benchmark (#33752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-02-22 21:18:08 -08:00
Cyrus Leung
987506bca6 [Refactor] Simplify dummy data generation (#35025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-22 20:55:27 -08:00
Woosuk Kwon
c645e9a214 [Model Runner V2] Remove propose_draft method (#35070)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-22 18:27:12 -08:00
Nick Hill
944ffb5968 [Model Runner V2][Minor] Remove redundant do_spec_decode field (#35039)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-22 16:18:04 -08:00
qizixi
2bcf71b9c0 [Spec Decode] Reduce TP communication for speculative decoding draft token generation (#34049)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-02-22 14:59:16 -08:00
tacos8me
b7892a3bef [Model] Add NVFP4 quantization support for Step3.5-Flash (#34478)
Signed-off-by: tacos8me <ian@cloudhabit.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-22 12:30:46 -07:00
Benjamin Chislett
682566b18e [Bug] Refactor max_num_batched_tokens to account for drafting (#34898)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-22 11:18:46 -05:00
qizixi
b9c2a565cc [Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup (#34529)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-02-22 08:08:32 -08:00
Andreas Karatzas
dd8c3a7fb2 [ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays (#35052)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 10:07:18 +00:00
Andreas Karatzas
a8a47c17b6 [ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 09:03:44 +00:00
Roger Wang
40f88d8318 [Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser (#34779)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-21 23:15:35 -08:00
Woosuk Kwon
2cbf9656ce [Model Runner V2] Enable CUDA graph for Eagle3 (#35040)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-21 21:42:50 -08:00
Xiao Li
30132cd144 Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030)
Signed-off-by: Xiao Li <ilx@meta.com>
2026-02-21 21:11:54 -08:00
Cyrus Leung
cbd95a2dd1 [Benchmark] Use sns.relplot for plotting (#35027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-21 20:26:48 -08:00
Athrael Soju
970861ac0c [New Model] Add ColModernVBERT (#34558)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>
2026-02-22 12:23:41 +08:00
Wentao Ye
d24bdd7c4b [CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests (#34961)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-02-21 20:23:24 -08:00
Andreas Karatzas
d403c1da1c [CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 04:01:10 +00:00
Woosuk Kwon
b71fbd06e2 [Model Runner V2] Support attention group (#35036)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-21 16:42:53 -08:00
Vadim Gimpelson
74d90b1ce4 [Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-02-21 19:28:01 -05:00
Woosuk Kwon
a4047d4ea9 [Model Runner V2] Support Eagle3 (no CUDA graph) (#35029)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-21 12:55:24 -08:00
Cyrus Leung
965fe45935 [CI/Build] Fix gRPC version mismatch (#35013)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-21 12:14:41 -07:00
Roman
98b0205c3c [Frontend] Add automatic language detection for Whisper transcription (#34342)
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de>
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-21 04:49:41 -08:00
Huy Do
272b535ab3 [Bugfix] Gate 256-bit instructions to CUDA 12.9+ (#34791)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-21 04:48:14 -08:00
Cyrus Leung
f74f1572ca [Benchmark] Improve benchmarks (#35012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-21 10:31:58 +00:00
petrpechman
bebfe55b1c [Doc] Fix example of eagle3 (#34960)
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz>
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>
2026-02-21 09:57:53 +00:00
Nick Hill
820d7815eb [Core] Minor structured-output related scheduler optimization (#34765)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-21 01:38:28 -08:00
Nicolò Lucchesi
ab6f3487a6 [PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56 [ROCm] Enable bitsandbytes quantization support on ROCm (#34688)
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
2026-02-21 00:34:55 -08:00
jennyyyyzhen
2aab2bb543 [ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
2026-02-20 20:32:05 -08:00
Andreas Karatzas
54254f7a61 [ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-20 20:25:23 -08:00
Andreas Karatzas
cf93c1a128 [ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-20 20:25:07 -08:00
Andreas Karatzas
89358f0d35 [CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-20 20:12:05 -08:00
zhongdaor-nv
a0fe7ea2f0 [feat] Add per-block extra_keys to KV events (#33304)
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-20 20:11:40 -08:00
Andreas Karatzas
991d6bff38 [CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-20 20:03:32 -08:00
Kata Coder
5719a4e4e6 [Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
2026-02-20 20:01:40 -08:00
pougetat
11be2c74dc [Realtime] Add Qwen3-ASR realtime streaming support (#34613)
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-20 19:59:42 -08:00
Xin Yang
7a5adad480 [Kernel] Optimize sample_recovered_tokens_kernel (#34974)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-02-20 19:59:06 -08:00
Li
59c6233297 Support prompt_embeds for pooling requests in output processor (#34904)
Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
2026-02-20 19:57:38 -08:00
Taneem Ibrahim
d38cd3dde5 [Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2026-02-20 19:56:33 -08:00
Rohan Potdar
ded333fb9b [ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2026-02-20 19:56:16 -08:00
Yanan Cao
9d7577b2bd [Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 19:55:51 -08:00
Vlad Tiberiu Mihailescu
e739c29ea4 [CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2026-02-20 19:54:55 -08:00
yugong333
a55caf6ae9 [LoRA] Support Quantized Adapters (#30286)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-20 19:54:35 -08:00