Nicolò Lucchesi
|
4dc11b06d3
|
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789)
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit ea6102b85d)
|
2026-01-23 02:53:12 -08:00 |
|
Isotr0py
|
2bd95d803a
|
[Misc] Bump opencv-python dependecy version to 4.13 (#32668)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit 444e2e7e1f)
|
2026-01-23 02:52:47 -08:00 |
|
Isotr0py
|
f46d576c54
|
[Misc] Replace urllib's urlparse with urllib3's parse_url (#32746)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit 8ebf271bb6)
|
2026-01-23 02:51:53 -08:00 |
|
Shengqi Chen
|
d68209402d
|
[build] fix cu130 related release pipeline steps and publish as nightly image (#32522)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
(cherry picked from commit 965765aef9)
|
2026-01-17 18:38:46 -08:00 |
|
Shengqi Chen
|
b17039bccc
|
[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032)
(cherry picked from commit 8e61425ee6)
v0.14.0
|
2026-01-16 21:04:48 -08:00 |
|
Cyrus Leung
|
48b67ba75f
|
[Frontend] Standardize use of create_error_response (#32319)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-16 11:35:10 +00:00 |
|
TJian
|
09f4264a55
|
[Bugfix] Fix ROCm dockerfiles (#32447)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-16 10:50:00 +08:00 |
|
Matthew Bonanni
|
7f42dc20bb
|
[CI] Fix LM Eval Large Models (H100) (#32423)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
(cherry picked from commit bcf2333cd6)
v0.14.0rc2
|
2026-01-15 18:00:21 -08:00 |
|
TJian
|
c2a37a3cf8
|
Cherry pick [ROCm] [CI] [Release] Rocm wheel pipeline with sccache #32264
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-01-15 17:59:58 -08:00 |
|
Michael Goin
|
0e31fc7996
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 1be5a73571)
|
2026-01-15 17:55:20 -08:00 |
|
Pleaplusone
|
6ac0fcf416
|
[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413)
Signed-off-by: ganyi <ygan@amd.com>
(cherry picked from commit 77c16df31d)
|
2026-01-15 17:55:06 -08:00 |
|
Douglas Lehr
|
b62249725c
|
[ROCM] Add ROCm image build to release pipeline (#31995)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
(cherry picked from commit c5891b5430)
|
2026-01-15 17:54:47 -08:00 |
|
vllmellm
|
1b57275207
|
[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
(cherry picked from commit e27078ea80)
|
2026-01-15 17:54:01 -08:00 |
|
Martin Hickey
|
2c24bc6996
|
[BugFix] [KVConnector] Fix KV events for LMCache connector (#32169)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-13 10:56:23 -08:00 |
|
Cyrus Leung
|
0aa8c40552
|
[Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 10:56:23 -08:00 |
|
Andreas Karatzas
|
11b6af5280
|
[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#32099)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
v0.14.0rc1
|
2026-01-13 05:46:53 +00:00 |
|
Wentao Ye
|
2a719e0865
|
[Perf] Optimize requests abort (#32211)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-13 04:11:37 +00:00 |
|
Andrew Bennett
|
f243abc92d
|
Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
|
2026-01-13 03:41:47 +00:00 |
|
Sanghoon Yoon
|
60b77e1463
|
[Frontend] Add reasoning_effort to OpenAIServing._preprocess_chat() (#31956)
Signed-off-by: Sanghoon Yoon <seanyoon@kakao.com>
|
2026-01-13 03:21:49 +00:00 |
|
cjackal
|
15b33ff064
|
[Misc] improve warning/assert messages (#32226)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2026-01-13 03:11:23 +00:00 |
|
Nick Hill
|
c6bb5b5603
|
[BugFix] Fix engine crash caused by chat tools + response_format (#32127)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-13 10:33:14 +08:00 |
|
Nick Hill
|
9273a427b5
|
[Misc] Allow enabling NCCL for DP sync when async scheduling (#32197)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-13 02:03:08 +00:00 |
|
Cyrus Leung
|
78d13ea9de
|
[Model] Handle trust_remote_code for transformers backend (#32194)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 09:30:12 +08:00 |
|
Andrew Xia
|
a307ac0734
|
[responsesAPI] add unit test for optional function tool call id (#32036)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-01-12 16:14:54 -08:00 |
|
Divakar Verma
|
a28d9f4470
|
[ROCm][CI] Handle pytest status code 5 when a shard isn't allocated any tests (#32040)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-12 17:35:49 -05:00 |
|
xuebwang-amd
|
629584bfc9
|
[Kernel][MoE] fix computation order of MoE weight multiplication and improve flow (#31962)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-01-12 17:17:30 -05:00 |
|
Woosuk Kwon
|
0a7dd23754
|
[Model Runner V2] Add support for M-RoPE (#32143)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-12 13:37:43 -08:00 |
|
Woosuk Kwon
|
dec28688c5
|
[Model Runner V2] Minor refactor for logit_bias (#32209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-12 13:08:30 -08:00 |
|
Vadim Gimpelson
|
9f430c94bd
|
[BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-12 20:42:06 +00:00 |
|
Nicolò Lucchesi
|
f8bd8394e3
|
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 20:38:49 +00:00 |
|
Woosuk Kwon
|
ca81811bfe
|
[Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens (#32163)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-12 11:31:10 -08:00 |
|
Lucas Kabela
|
ad8818bb5e
|
[Misc][BE] Type coverage for vllm/compilation [3/3] (#31748)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-12 19:24:38 +00:00 |
|
Nicolò Lucchesi
|
08e8e99ce7
|
[Misc] Change log level for batch queue log (#32192)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 18:59:31 +00:00 |
|
Or Ozeri
|
2be765b68a
|
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 18:39:38 +00:00 |
|
Roger Wang
|
16abe6b85a
|
[Misc] Set default torch num threads for input processing (#31879)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-12 10:28:16 -08:00 |
|
Ilya Markov
|
1eb61ab34b
|
[Refactor] EPLB rebalance algo to NumPy (#30697)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-12 18:13:23 +00:00 |
|
Kyungmin Lee
|
3d962d72ab
|
[BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
|
2026-01-12 10:00:45 -08:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Cyrus Leung
|
7c0d3c5152
|
[Benchmark] Share data between SLA runs (#32184)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 01:12:22 +08:00 |
|
Nicolò Lucchesi
|
5b68107411
|
[Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 18:10:05 +01:00 |
|
Asaf Joseph Gardin
|
8fb2c135be
|
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-01-12 17:02:38 +00:00 |
|
Cyrus Leung
|
8863c2b25c
|
[Model] Standardize pooling heads (#32148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-12 17:01:49 +00:00 |
|
danielafrimi
|
3f72639d36
|
[FIX] Add NO_MUL activation support for modular kernel path (#31528)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Signed-off-by: <>
Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@pool0-01777.cm.cluster>
|
2026-01-12 11:55:49 -05:00 |
|
Jaehyun An
|
6bc9c8473e
|
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
|
2026-01-12 16:39:02 +00:00 |
|
Kyungmin Lee
|
63ed2409e8
|
Add K-EXAONE-236B-A23B (#31621)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-12 16:30:50 +00:00 |
|
Andy Zhang
|
95e53d907c
|
doc: Update model references in supported_models.md (#32188)
|
2026-01-12 08:15:28 -08:00 |
|
TJian
|
0346396e94
|
[ROCm] [Bugfix] Fix order of mori build in Dockerfile.rocm_base (#32179)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-12 15:33:21 +00:00 |
|
Andy Zhang
|
e68b0dad8b
|
doc: Update model name for Qwen3-Coder in documentation (#32185)
Signed-off-by: Andy Zhang <xiazhang@microsoft.com>
|
2026-01-12 07:10:50 -08:00 |
|
Or Ozeri
|
9cddbdba6d
|
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 15:00:43 +00:00 |
|
Hongxin Xu
|
49e6b86c91
|
[Feature] Support recording expert indices for rollout router replay (#28284)
Signed-off-by: xhx1022 <1737006628@qq.com>
Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com>
Signed-off-by: arlenxu <arlenxu@tencent.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
|
2026-01-12 06:23:04 -08:00 |
|