Andreas Karatzas
|
5f68464f92
|
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-23 05:05:54 -08:00 |
|
Vincent Gimenes
|
aa08a30fc9
|
[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig (#35060)
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
|
2026-02-23 05:05:36 -08:00 |
|
Wentao Ye
|
7f40e9e516
|
[Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item (#35068)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-23 05:05:20 -08:00 |
|
Harry Mellor
|
103e614b14
|
Fix pipeline parallel with embed scaling in the Transformers modelling backend (#35094)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-23 05:04:47 -08:00 |
|
Neil Schemenauer
|
54e2f83d0a
|
[Feature] Lazy import for the "mistral" tokenizer module. (#34651)
Signed-off-by: Neil Schemenauer <nas@arctrix.com>
|
2026-02-23 00:43:01 -08:00 |
|
Gabe Goodhart
|
e631f8e78e
|
fix: Apply embedding_multiplier to inputs_embeds (#34813)
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-23 00:42:46 -08:00 |
|
Martin Hickey
|
e97c46a92d
|
[BugFix]: Fix local mypy issues (#34739)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-23 00:40:29 -08:00 |
|
Jee Jee Li
|
7291d1b288
|
[Bugfix] Fix kernel benchmark (#33752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-22 21:18:08 -08:00 |
|
Cyrus Leung
|
987506bca6
|
[Refactor] Simplify dummy data generation (#35025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-22 20:55:27 -08:00 |
|
Woosuk Kwon
|
c645e9a214
|
[Model Runner V2] Remove propose_draft method (#35070)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-22 18:27:12 -08:00 |
|
Nick Hill
|
944ffb5968
|
[Model Runner V2][Minor] Remove redundant do_spec_decode field (#35039)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-22 16:18:04 -08:00 |
|
qizixi
|
2bcf71b9c0
|
[Spec Decode] Reduce TP communication for speculative decoding draft token generation (#34049)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-22 14:59:16 -08:00 |
|
tacos8me
|
b7892a3bef
|
[Model] Add NVFP4 quantization support for Step3.5-Flash (#34478)
Signed-off-by: tacos8me <ian@cloudhabit.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-22 12:30:46 -07:00 |
|
Benjamin Chislett
|
682566b18e
|
[Bug] Refactor max_num_batched_tokens to account for drafting (#34898)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-22 11:18:46 -05:00 |
|
qizixi
|
b9c2a565cc
|
[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup (#34529)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-22 08:08:32 -08:00 |
|
Andreas Karatzas
|
dd8c3a7fb2
|
[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays (#35052)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-22 10:07:18 +00:00 |
|
Andreas Karatzas
|
a8a47c17b6
|
[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-22 09:03:44 +00:00 |
|
Roger Wang
|
40f88d8318
|
[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser (#34779)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-21 23:15:35 -08:00 |
|
Woosuk Kwon
|
2cbf9656ce
|
[Model Runner V2] Enable CUDA graph for Eagle3 (#35040)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 21:42:50 -08:00 |
|
Xiao Li
|
30132cd144
|
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030)
Signed-off-by: Xiao Li <ilx@meta.com>
|
2026-02-21 21:11:54 -08:00 |
|
Cyrus Leung
|
cbd95a2dd1
|
[Benchmark] Use sns.relplot for plotting (#35027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 20:26:48 -08:00 |
|
Athrael Soju
|
970861ac0c
|
[New Model] Add ColModernVBERT (#34558)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>
|
2026-02-22 12:23:41 +08:00 |
|
Wentao Ye
|
d24bdd7c4b
|
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests (#34961)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-21 20:23:24 -08:00 |
|
Andreas Karatzas
|
d403c1da1c
|
[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-22 04:01:10 +00:00 |
|
Woosuk Kwon
|
b71fbd06e2
|
[Model Runner V2] Support attention group (#35036)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 16:42:53 -08:00 |
|
Vadim Gimpelson
|
74d90b1ce4
|
[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-21 19:28:01 -05:00 |
|
Woosuk Kwon
|
a4047d4ea9
|
[Model Runner V2] Support Eagle3 (no CUDA graph) (#35029)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 12:55:24 -08:00 |
|
Cyrus Leung
|
965fe45935
|
[CI/Build] Fix gRPC version mismatch (#35013)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 12:14:41 -07:00 |
|
Roman
|
98b0205c3c
|
[Frontend] Add automatic language detection for Whisper transcription (#34342)
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de>
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-21 04:49:41 -08:00 |
|
Huy Do
|
272b535ab3
|
[Bugfix] Gate 256-bit instructions to CUDA 12.9+ (#34791)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 04:48:14 -08:00 |
|
Cyrus Leung
|
f74f1572ca
|
[Benchmark] Improve benchmarks (#35012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 10:31:58 +00:00 |
|
petrpechman
|
bebfe55b1c
|
[Doc] Fix example of eagle3 (#34960)
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz>
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>
|
2026-02-21 09:57:53 +00:00 |
|
Nick Hill
|
820d7815eb
|
[Core] Minor structured-output related scheduler optimization (#34765)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:38:28 -08:00 |
|
Nicolò Lucchesi
|
ab6f3487a6
|
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:34:57 -08:00 |
|
BADAOUI Abdennacer
|
8dc8a99b56
|
[ROCm] Enable bitsandbytes quantization support on ROCm (#34688)
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
|
2026-02-21 00:34:55 -08:00 |
|
jennyyyyzhen
|
2aab2bb543
|
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-02-20 20:32:05 -08:00 |
|
Andreas Karatzas
|
54254f7a61
|
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:23 -08:00 |
|
Andreas Karatzas
|
cf93c1a128
|
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:07 -08:00 |
|
Andreas Karatzas
|
89358f0d35
|
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:12:05 -08:00 |
|
zhongdaor-nv
|
a0fe7ea2f0
|
[feat] Add per-block extra_keys to KV events (#33304)
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 20:11:40 -08:00 |
|
Andreas Karatzas
|
991d6bff38
|
[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:03:32 -08:00 |
|
Kata Coder
|
5719a4e4e6
|
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
|
2026-02-20 20:01:40 -08:00 |
|
pougetat
|
11be2c74dc
|
[Realtime] Add Qwen3-ASR realtime streaming support (#34613)
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-20 19:59:42 -08:00 |
|
Xin Yang
|
7a5adad480
|
[Kernel] Optimize sample_recovered_tokens_kernel (#34974)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 19:59:06 -08:00 |
|
Li
|
59c6233297
|
Support prompt_embeds for pooling requests in output processor (#34904)
Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
|
2026-02-20 19:57:38 -08:00 |
|
Taneem Ibrahim
|
d38cd3dde5
|
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-02-20 19:56:33 -08:00 |
|
Rohan Potdar
|
ded333fb9b
|
[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-20 19:56:16 -08:00 |
|
Yanan Cao
|
9d7577b2bd
|
[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-20 19:55:51 -08:00 |
|
Vlad Tiberiu Mihailescu
|
e739c29ea4
|
[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2026-02-20 19:54:55 -08:00 |
|
yugong333
|
a55caf6ae9
|
[LoRA] Support Quantized Adapters (#30286)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 19:54:35 -08:00 |
|