Commit Graph

13941 Commits

Author SHA1 Message Date
Isotr0py
19fab44152 [Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-15 04:18:57 -08:00
Seiji Eicher
79c7e09235 [KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround (#34415)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2026-02-14 23:26:10 -08:00
haosdent
79f3fab05a [Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE (#34494)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-14 23:25:46 -08:00
Vadim Gimpelson
604b9eaec5 [BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 (#34476)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-02-14 23:25:17 -08:00
Stanislav Kirillov
50dbd6c9e6 [bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used (#34516)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-14 23:24:25 -08:00
Andreas Karatzas
98bcc6ca59 [CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 (#34468)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-14 23:08:38 -08:00
Andreas Karatzas
f13e86d8dd [Kernels] Fix Helion GPU utils to use platform-agnostic device name API (#34537)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-14 20:29:23 -08:00
Woosuk Kwon
9ca768c740 [Model Runner V2] Minor cleanup for Sampler (#34563)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-14 18:29:03 -08:00
Thomas Parnell
d5fe3f702c [Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2026-02-14 13:15:56 -08:00
Cyrus Leung
73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-14 10:14:21 -08:00
Andreas Karatzas
b3c14229b0 [ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-14 07:32:09 -08:00
Roger Wang
2f186635cb [Bugfix] Fix Qwen3.5 config loading (#34554)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-14 03:56:11 -08:00
Christian Pinto
342a7cda2d [Misc] Update tests and examples for Prithvi/Terratorch models (#34416)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1 [new model] add COLQwen3 code & Inference (#34398)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
Signed-off-by: katacoder <craftsangjae@gmail.com>
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366 [CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-13 20:04:29 -08:00
Julien Denize
60ca7981bc Add explicit validation error for tool calls. (#34438)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2026-02-13 20:04:01 -08:00
Christian S. Perone
0ef5b9147b fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection (#34527)
Signed-off-by: Christian S. Perone <christian.perone@gmail.com>
Signed-off-by: Christian S. Perone <perone@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-13 20:03:37 -08:00
Shiyan Deng
ed242652d7 [bug] Make sure get_modality_with_max_tokens is deterministic (#34533)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
2026-02-13 20:02:59 -08:00
Wei Zhao
b37b679770 [Feature][Perf] Support Selective CPU Weight Offloading (#34535)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-02-13 20:02:24 -08:00
Andreas Karatzas
a0638d052d [Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-13 20:01:42 -08:00
Harry Huang
c027541eaf [Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-02-13 13:02:28 -08:00
Ben Browning
fd267bc7b7 [Bugfix]: Fix structured output in multi-turn gpt-oss (#34454)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-13 11:12:48 -08:00
Michael Goin
bfaa559305 Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" (#34530) 2026-02-13 10:35:29 -08:00
Richard Zou
87789c8364 [Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-02-13 09:52:20 -08:00
Pushpinder Singh
bcd65c1f6a [Bugfix] Replace c10::optional with std::optional in topk kernel (#34467)
Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com>
2026-02-13 08:30:23 -08:00
Wei Zhao
59d53066d8 [Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b [Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501)
Signed-off-by: LoganJane <LoganJane73@hotmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-13 16:05:34 +00:00
Roger Wang
1dae7b7843 [Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one (#34508)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-13 13:59:00 +00:00
Roger Wang
5885e330ef [Misc] Port Qwen3.5 Configs (#34512)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20 Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
2026-02-13 09:53:09 +00:00
Woosuk Kwon
0916e7960b [GDN] Use CPU tensors to build GDN metadata (#34498)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-13 01:24:45 -08:00
Wentao Ye
3d2a026fd0 [Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-02-13 16:38:16 +08:00
Aaron Hao
dddbff4624 [Core] Move pause and resume functions into engine (#34125)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-02-13 00:15:10 -08:00
Martin Hickey
47e9b63e1a [KVConnector] Clean up redundant code in KV connectors (#34147)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2026-02-13 00:14:30 -08:00
Matthias Gehre
934acddef9 [Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-02-13 00:14:27 -08:00
Marek Michalowski
742d214d6e [Bugfix] fix the import path in moe test utils.py (#34245)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
2026-02-13 00:13:45 -08:00
haosdent
4137c5dfa7 [Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-13 00:13:22 -08:00
Harry Huang
7a8a46ddcb [BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec (#34440)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-02-13 00:13:14 -08:00
myselvess
bcf0731aa0 [New Model] support new model ovis2.6 (#34426)
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>
2026-02-13 00:12:45 -08:00
Cyrus Leung
ec090c2429 [Refactor] Call renderer for online IO processor request (#34490)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 22:48:45 -08:00
Roger Wang
eea3024f43 [Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-12 22:48:42 -08:00
Cyrus Leung
2f308214c0 [Refactor] Pass full VllmConfig to Renderer (#34485)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 22:48:38 -08:00
Cyrus Leung
1b4e8e53f8 [CI/Build] Fix CUDA re-initialization error in distributed model tests (#34491)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-13 06:43:53 +00:00
haosdent
dcf6ee8592 [Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-12 21:04:06 -08:00
Cyrus Leung
372b2e762a [Bugfix] Standardize getting number of image patches/tokens (#34358)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 20:47:01 -08:00
Andreas Karatzas
6afa587d31 [ROCm][CI] Fix serving tokens test failures (#34047)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-13 11:27:53 +08:00
Cyrus Leung
94ed6cf6ea Add new sections to CODEOWNERS (#34309)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 18:39:28 -08:00
Harry Huang
bf37812ca7 [Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode (#33706)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-02-12 18:21:52 -08:00
Frank Wang
b86bf4417e [Bugfix] Fix Random Dataset Prefix Length Inaccuracy (#33907)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-12 18:21:19 -08:00
Yanan Cao
de13dd781f [Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure (#34025)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2026-02-12 18:21:05 -08:00