Andreas Karatzas
|
a8a47c17b6
|
[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-22 09:03:44 +00:00 |
|
Roger Wang
|
40f88d8318
|
[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser (#34779)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-21 23:15:35 -08:00 |
|
Woosuk Kwon
|
2cbf9656ce
|
[Model Runner V2] Enable CUDA graph for Eagle3 (#35040)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 21:42:50 -08:00 |
|
Xiao Li
|
30132cd144
|
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030)
Signed-off-by: Xiao Li <ilx@meta.com>
|
2026-02-21 21:11:54 -08:00 |
|
Cyrus Leung
|
cbd95a2dd1
|
[Benchmark] Use sns.relplot for plotting (#35027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 20:26:48 -08:00 |
|
Athrael Soju
|
970861ac0c
|
[New Model] Add ColModernVBERT (#34558)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>
|
2026-02-22 12:23:41 +08:00 |
|
Wentao Ye
|
d24bdd7c4b
|
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests (#34961)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-21 20:23:24 -08:00 |
|
Andreas Karatzas
|
d403c1da1c
|
[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-22 04:01:10 +00:00 |
|
Woosuk Kwon
|
b71fbd06e2
|
[Model Runner V2] Support attention group (#35036)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 16:42:53 -08:00 |
|
Vadim Gimpelson
|
74d90b1ce4
|
[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-21 19:28:01 -05:00 |
|
Woosuk Kwon
|
a4047d4ea9
|
[Model Runner V2] Support Eagle3 (no CUDA graph) (#35029)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-21 12:55:24 -08:00 |
|
Cyrus Leung
|
965fe45935
|
[CI/Build] Fix gRPC version mismatch (#35013)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 12:14:41 -07:00 |
|
Roman
|
98b0205c3c
|
[Frontend] Add automatic language detection for Whisper transcription (#34342)
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de>
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-21 04:49:41 -08:00 |
|
Huy Do
|
272b535ab3
|
[Bugfix] Gate 256-bit instructions to CUDA 12.9+ (#34791)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 04:48:14 -08:00 |
|
Cyrus Leung
|
f74f1572ca
|
[Benchmark] Improve benchmarks (#35012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-21 10:31:58 +00:00 |
|
petrpechman
|
bebfe55b1c
|
[Doc] Fix example of eagle3 (#34960)
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz>
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>
|
2026-02-21 09:57:53 +00:00 |
|
Nick Hill
|
820d7815eb
|
[Core] Minor structured-output related scheduler optimization (#34765)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:38:28 -08:00 |
|
Nicolò Lucchesi
|
ab6f3487a6
|
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:34:57 -08:00 |
|
BADAOUI Abdennacer
|
8dc8a99b56
|
[ROCm] Enable bitsandbytes quantization support on ROCm (#34688)
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
|
2026-02-21 00:34:55 -08:00 |
|
jennyyyyzhen
|
2aab2bb543
|
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-02-20 20:32:05 -08:00 |
|
Andreas Karatzas
|
54254f7a61
|
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:23 -08:00 |
|
Andreas Karatzas
|
cf93c1a128
|
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:07 -08:00 |
|
Andreas Karatzas
|
89358f0d35
|
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:12:05 -08:00 |
|
zhongdaor-nv
|
a0fe7ea2f0
|
[feat] Add per-block extra_keys to KV events (#33304)
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 20:11:40 -08:00 |
|
Andreas Karatzas
|
991d6bff38
|
[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:03:32 -08:00 |
|
Kata Coder
|
5719a4e4e6
|
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
|
2026-02-20 20:01:40 -08:00 |
|
pougetat
|
11be2c74dc
|
[Realtime] Add Qwen3-ASR realtime streaming support (#34613)
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-20 19:59:42 -08:00 |
|
Xin Yang
|
7a5adad480
|
[Kernel] Optimize sample_recovered_tokens_kernel (#34974)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 19:59:06 -08:00 |
|
Li
|
59c6233297
|
Support prompt_embeds for pooling requests in output processor (#34904)
Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
|
2026-02-20 19:57:38 -08:00 |
|
Taneem Ibrahim
|
d38cd3dde5
|
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-02-20 19:56:33 -08:00 |
|
Rohan Potdar
|
ded333fb9b
|
[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-20 19:56:16 -08:00 |
|
Yanan Cao
|
9d7577b2bd
|
[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-20 19:55:51 -08:00 |
|
Vlad Tiberiu Mihailescu
|
e739c29ea4
|
[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2026-02-20 19:54:55 -08:00 |
|
yugong333
|
a55caf6ae9
|
[LoRA] Support Quantized Adapters (#30286)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 19:54:35 -08:00 |
|
Lucas Wilkinson
|
0e22cd618b
|
Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " (#34997)
|
2026-02-20 17:19:19 -08:00 |
|
Wei Zhao
|
ea5f903f80
|
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 13:37:31 -08:00 |
|
Ryan Rock
|
0632ed8778
|
[AMD][CI] Fix test_custom_allreduce for A100 testgroup (#34735)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-02-20 21:33:04 +00:00 |
|
Lucas Wilkinson
|
aaefc58ee0
|
[CI] Revert PRs 34818 and 33600 (#34979)
|
2026-02-20 13:25:50 -08:00 |
|
Wei Zhao
|
f24b2de3d3
|
[Test] Add FP8 KV Cache Testing for MLA Backends (#34473)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-20 18:51:58 +00:00 |
|
Michael Goin
|
fac1507f03
|
[CI] Remove failing prime-rl integration test (#34843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-20 10:17:42 -08:00 |
|
Zhengxu Chen
|
f863994084
|
[compile] Fix torch.compile time discrepancy in logging. (#34912)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 08:47:14 -08:00 |
|
Zhengxu Chen
|
e4a5d8c653
|
[compile] Move torch_aot_compile directory under torch_compile_cache (#34831)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-20 08:46:45 -08:00 |
|
Yanan Cao
|
a6d0299c75
|
[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching (#34185)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-20 08:36:51 -08:00 |
|
Harry Mellor
|
6ce80f7071
|
Ensure that MkDocs v2 does not get installed (#34958)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-20 15:38:11 +00:00 |
|
Huamin Li
|
1fe462168c
|
[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor (#34870)
Signed-off-by: Huamin Li <3ericli@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 06:21:56 -08:00 |
|
Flora Feng
|
ed31a020ee
|
[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py (#34909)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 06:20:46 -08:00 |
|
Cyrus Leung
|
f9ac19204f
|
[V0 Deprecation] Remove unused MM placeholders in request output (#34944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-20 06:19:23 -08:00 |
|
Vadim Gimpelson
|
59965affbd
|
[BUGFIX] Fix _dummy_run missing prepare_inputs_event synchronization (#34866)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-20 05:54:27 -08:00 |
|
Xin Yang
|
b1c4f0b265
|
[Kernel] Optimize grouped topk kernel (#34206)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 01:34:45 -08:00 |
|
Kevin McKay
|
8de7c636cc
|
[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support (#32877)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-19 22:25:46 -08:00 |
|