Commit Graph

12968 Commits

Author SHA1 Message Date
Woosuk Kwon
90c0836902 [Model Runner V2] Refactor Sampler (#32245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-13 17:58:12 -08:00
Roberto L. Castro
8ef50d9a6b [Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
2026-01-13 15:22:53 -08:00
emricksini-h
2a60ac91d0 [Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get (#30784)
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
2026-01-13 14:35:05 -08:00
Michael Goin
9e65bb4ef4 Add mergify label job for "bug" in PR titles (#31980)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-13 14:28:19 -08:00
Simon Mo
0db574b185 [Build] Add scripts for cherry-picking and trigger build (#32282)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-01-13 13:21:05 -08:00
HappyAmazonian
2f4a71daf2 [Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint (#28502)
Signed-off-by: Shen Teng <sheteng@amazon.com>
Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com>
2026-01-13 13:06:10 -08:00
Rabi Mishra
69f8a0ea37 fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-13 19:11:54 +00:00
Wentao Ye
f28125d87b [Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-13 10:58:18 -08:00
Dmitry Tokarev
46f8c6b725 Fix CUDA 13 wheel installation doc (#32276)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
2026-01-13 10:48:37 -08:00
Andrew Xia
af54d2e2d0 [responseAPI] support partial message generation (#32100)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-01-13 10:41:26 -08:00
Sage Moore
6beef12b9b [EPLB][Cleanup] Remove is_async_enabled from EplbModelState (#32050)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-01-13 18:19:03 +00:00
Mark McLoughlin
ab74b2a27a [Trivial] Remove duplicate enable_mfu_metrics (#32246)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2026-01-14 01:09:23 +08:00
Matthew Bonanni
2263d44b68 [4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-13 09:08:45 -08:00
Mathis Felardos
4f3676e726 nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak (#32181)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
2026-01-13 16:21:10 +00:00
Martin Hickey
510265472c [BugFix] [KVConnector] Fix KV events for LMCache connector (#32169)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-13 15:50:34 +00:00
Chauncey
4f02cb2eac [Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-13 15:37:34 +00:00
Cyrus Leung
252c011012 [Refactor] Remove MultiModalProfiler (#32254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 15:10:20 +00:00
Matthew Bonanni
98f60e5acb [6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-13 05:38:52 -08:00
Chauncey
fefce49807 [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-13 13:01:39 +00:00
Mickaël Seznec
a5bbbd2f24 [Quantization] fix: overflow with static per-tensor scaling (#29867)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-13 12:56:01 +00:00
Nicolò Lucchesi
8c8653b672 [Docs] Nixl Usage recommend fail kv_load_failure_policy (#32198)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-13 12:51:57 +00:00
Cyrus Leung
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Cyrus Leung
eb28e8068d [Refactor] Remove get_encoder_dummy_data (#32241)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 09:21:23 +00:00
YunzhuLu
542a4059b2 [Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (#32126)
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-13 09:04:29 +00:00
Andreas Karatzas
df7e12715f [ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-13 15:14:30 +08:00
Roy Wang
44c34f22d9 [Doc] Update installation from source command (#32239)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-01-12 23:10:27 -08:00
Xingyu Liu
80221e1884 [BugFix]Fix eagle draft_model_config and add tests (#31753)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
2026-01-12 23:09:36 -08:00
Andreas Karatzas
5e714f7ff4 [ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-12 22:33:59 -08:00
Andreas Karatzas
11b6af5280 [ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#32099)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
v0.14.0rc1
2026-01-13 05:46:53 +00:00
Wentao Ye
2a719e0865 [Perf] Optimize requests abort (#32211)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-13 04:11:37 +00:00
Andrew Bennett
f243abc92d Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
2026-01-13 03:41:47 +00:00
Sanghoon Yoon
60b77e1463 [Frontend] Add reasoning_effort to OpenAIServing._preprocess_chat() (#31956)
Signed-off-by: Sanghoon Yoon <seanyoon@kakao.com>
2026-01-13 03:21:49 +00:00
cjackal
15b33ff064 [Misc] improve warning/assert messages (#32226)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2026-01-13 03:11:23 +00:00
Nick Hill
c6bb5b5603 [BugFix] Fix engine crash caused by chat tools + response_format (#32127)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-13 10:33:14 +08:00
Nick Hill
9273a427b5 [Misc] Allow enabling NCCL for DP sync when async scheduling (#32197)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-13 02:03:08 +00:00
Cyrus Leung
78d13ea9de [Model] Handle trust_remote_code for transformers backend (#32194)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 09:30:12 +08:00
Andrew Xia
a307ac0734 [responsesAPI] add unit test for optional function tool call id (#32036)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2026-01-12 16:14:54 -08:00
Divakar Verma
a28d9f4470 [ROCm][CI] Handle pytest status code 5 when a shard isn't allocated any tests (#32040)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-12 17:35:49 -05:00
xuebwang-amd
629584bfc9 [Kernel][MoE] fix computation order of MoE weight multiplication and improve flow (#31962)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
2026-01-12 17:17:30 -05:00
Woosuk Kwon
0a7dd23754 [Model Runner V2] Add support for M-RoPE (#32143)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-12 13:37:43 -08:00
Woosuk Kwon
dec28688c5 [Model Runner V2] Minor refactor for logit_bias (#32209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-12 13:08:30 -08:00
Vadim Gimpelson
9f430c94bd [BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-01-12 20:42:06 +00:00
Nicolò Lucchesi
f8bd8394e3 [NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 20:38:49 +00:00
Woosuk Kwon
ca81811bfe [Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens (#32163)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-12 11:31:10 -08:00
Lucas Kabela
ad8818bb5e [Misc][BE] Type coverage for vllm/compilation [3/3] (#31748)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-12 19:24:38 +00:00
Nicolò Lucchesi
08e8e99ce7 [Misc] Change log level for batch queue log (#32192)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 18:59:31 +00:00
Or Ozeri
2be765b68a [BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-12 18:39:38 +00:00
Roger Wang
16abe6b85a [Misc] Set default torch num threads for input processing (#31879)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-01-12 10:28:16 -08:00
Ilya Markov
1eb61ab34b [Refactor] EPLB rebalance algo to NumPy (#30697)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-01-12 18:13:23 +00:00
Kyungmin Lee
3d962d72ab [BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-12 10:00:45 -08:00