Commit Graph

4838 Commits

Author SHA1 Message Date
Cyrus Leung
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 19:44:52 +00:00
Andrey Talman
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
2026-03-17 18:47:59 +00:00
Andreas Karatzas
4ed51308c8 [CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-17 09:08:08 -07:00
Isotr0py
a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Bhoomit
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:36:41 +00:00
Harry Mellor
ecfcdd2ce4 Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:29:24 +00:00
Sage
59192dfd39 [Frontend] Complete OpenAI render delegation (#37287)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-17 13:53:55 +00:00
Cyrus Leung
f340324335 [1/2] Move InternVL-based processors (#37260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 21:50:56 +08:00
Viacheslav
293f036e6d Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2026-03-17 12:03:20 +00:00
Sage
00f8e0d211 [Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-17 11:22:54 +00:00
Augusto Yao
9c7cab5ebb [Feature]: Support for multiple embedding types in a single inference call (#35829)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2026-03-17 17:05:42 +08:00
Chauncey
132bfd45b6 [Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-17 08:54:52 +00:00
Benjamin Chislett
8a680463fa [Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447) 2026-03-17 07:07:33 +01:00
Flora Feng
3e3d320c1b [Refactor] Relocate responses API tests (#37241)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 05:14:52 +00:00
Flora Feng
384dc7f77b [Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 11:31:23 +08:00
Flora Feng
f04d5226f8 [CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 03:24:34 +00:00
Vadim Gimpelson
6c1cfbad32 Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
2026-03-16 17:48:42 -07:00
Harry Huang
45f526d652 [BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-03-17 00:38:52 +00:00
Walter Beller-Morales
061980c36a [Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-03-16 19:55:53 -04:00
Ben Browning
7a49742b88 [CI/Build] Add common tool call parser test suite (#27599)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2026-03-16 19:46:20 -04:00
Terry Gao
3e6a1e1686 [Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
2026-03-16 18:51:46 -04:00
Andreas Karatzas
4f9b14c21c [CI] Stabilize multinode DP internal LB completion tests (#36356)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 15:40:23 -07:00
EdalatiAli
e5b807607c [Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
2026-03-16 18:07:39 -04:00
Krish Gupta
c0f011918d [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (#36779)
Signed-off-by: Krish Gupta <krishom70@gmail.com>
2026-03-16 21:11:33 +00:00
rasmith
2cc26c3a99 [CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-03-16 13:22:57 -07:00
Flora Feng
dfa8852db2 [Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-16 15:53:07 -04:00
Nicolò Lucchesi
f5c081d432 [PD][Nixl] Add support for hybrid SSM-FA models (#36687) 2026-03-16 19:58:06 +01:00
Max de Bayser
9f9ecff4cd Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2026-03-16 10:49:09 -07:00
haosdent
ca1954d58c [Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-03-16 19:03:10 +02:00
Raushan Turganbay
55e6d3d5c0 [Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
Signed-off-by: raushan <raushan@huggingface.co>
2026-03-16 16:48:18 +00:00
Benjamin Bartels
0e5a9382af [Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
2026-03-16 22:01:57 +08:00
Fynn Schmitt-Ulms
04bf5a35fa [Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013) 2026-03-16 14:53:45 +01:00
Robin Nabel
bf9a185395 GLM4 tool parser: fix streaming mode (#35208)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-03-16 18:48:52 +08:00
Kunshang Ji
747b068136 [Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
2026-03-16 10:24:48 +00:00
haosdent
116ed130f4 [Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-16 10:30:23 +01:00
Isotr0py
912fbe9555 [Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-16 08:56:06 +00:00
Roy Wang
821eb80c0d [Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-03-16 01:33:36 -07:00
Andreas Karatzas
a2956a0f8e [ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 16:08:51 +08:00
Andreas Karatzas
911355e216 [ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 16:07:27 +08:00
leo-cf-tian
2754231ba3 [Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
2026-03-15 23:45:32 -07:00
bigshanedogg
2390d44209 [Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
2026-03-16 06:40:05 +00:00
Andreas Karatzas
d4c57863f7 [ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 04:49:31 +00:00
Andrew Xia
e9163b536e [responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126)
Signed-off-by: Andrew Xia <axia@meta.com>
2026-03-15 17:12:26 -07:00
Lalithnarayan C
7acaea634c In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:35:35 +00:00
Hari
a3e2e250f0 [Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
2026-03-15 19:38:21 +08:00
Isotr0py
143e4dccdf [Misc] Add online audio_in_video test (#36775)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-15 00:14:11 -07:00
arlo
8c29042bb9 [Feature] Add InstantTensor weight loader (#36139) 2026-03-14 18:05:23 +01:00
Sergey Zinchenko
4a718e770d [Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests (https://github.com/vllm-project/vllm/issues/35665) (#35684) 2026-03-14 14:10:11 +00:00
Flora Feng
bcfdadb1bc [Refactor] Relocate chat completion and anthropic tests (#36919)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-14 12:16:16 +08:00
Andrew Xia
f680dc1b39 [responsesAPI] prioritize content over summary in reasoning item input (#36516)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2026-03-14 09:20:30 +08:00