daniel-salib
8688c3d460
[fix] tesdt mcp_tool_calling_streaming with a more complex math question ( #32769 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-01-29 10:25:58 +00:00
Isotr0py
3a92c6f3b5
[Misc] Cleanup Kimi-K2.5's vision chunk modality entrypoints ( #33157 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-29 09:46:02 +00:00
shanjiaz
5eeba80c74
Adding optional speculator tests for larger models ( #32943 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
2026-01-29 16:54:02 +08:00
cmunley1
3bba2edb0f
support returning tokenids in responses api ( #33212 )
...
Signed-off-by: Christian Munley <cmunley@nvidia.com >
2026-01-29 16:52:39 +08:00
wang.yuqi
abb34ac43a
[Bugfix] Fix Qwen3-VL-Reranker load. ( #33298 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 08:42:53 +00:00
Cyrus Leung
51550179fc
[Refactor] Define MM data parser in processing info instead of processor itself ( #33260 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:55:17 +08:00
Michael Goin
ca1969186d
[UX] Enable nested configs in config yaml files ( #33193 )
2026-01-28 16:54:25 -05:00
Rohan Potdar
59bcc5b6f2
Use aiter triton fused_add_rmsnorm_pad for gpt-oss ( #30976 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-01-28 20:47:47 +00:00
Wentao Ye
3e440786af
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement ( #32618 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 20:30:32 +00:00
Nicolò Lucchesi
8ebf372e9d
[CI] Whisper tests enforce_eager=False ( #33098 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-28 09:36:56 -08:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Chauncey
8e5e40daf4
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default ( #33221 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-28 21:16:53 +08:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
ramos
36d450e3b8
Adds FunAudioChat multimodal audio model support ( #2 ) ( #33058 )
...
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com >
Signed-off-by: mayufeng <mayufeng@example.com >
Co-authored-by: mayufeng <mayufeng@example.com >
2026-01-28 05:18:09 +00:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Richard Zou
d9aa39a3bb
[torch.compile] Speed up MOE handling in forward_context ( #33184 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 15:17:54 -08:00
danielafrimi
83fb2d09e8
Support heterogeneous NemotronHPuzzle model ( #32549 )
...
Signed-off-by: <dafrimi@nvidia.com >
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
Signed-off-by: root <dafrimi@nvidia.com >
2026-01-27 10:55:54 -05:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Harry Mellor
14385c80fc
Fix weight mapping test for Transfomers v5 ( #33162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 12:30:14 +00:00
wang.yuqi
76139d0801
[Frontend] Frontend will only attach supported tasks corresponding entrypoints. ( #33139 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-27 12:15:43 +00:00
Roger Wang
b539f988e1
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 14:50:31 +08:00
Andreas Karatzas
6c00645712
[CI][Pooling] Stabilize ModernBERT test ( #32909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-27 05:26:48 +00:00
wangln19
2d7053438a
fix: preserve native tool call ID in multi-turn tool calling ( #32768 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <2037008807@qq.com >
2026-01-27 10:22:35 +08:00
Robert Shaw
5a93b9162b
[MoE Refactor] Integrate Naive Prepare Finalize into MK ( #32567 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com >
2026-01-27 01:28:02 +00:00
Jared Wen
6ee7f18f33
[Logging] add --disable-access-log-for-endpoints CLI option ( #30011 )
...
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).
This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.
Fixes #29982
Signed-off-by: JaredforReal <w13431838023@gmail.com >
2026-01-26 21:49:03 +00:00
Cyrus Leung
c25dbee40d
[Model] Bump transformers version for test registry ( #33100 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 18:53:22 +00:00
Chauncey
a2393ed496
[CI] Fix AssertionError: MCP tool call not found in output_messages ( #33093 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-26 15:19:57 +00:00
Yuxuan Zhang
bb17e8f11c
[GLM-OCR] GLM-OCR with MTP Support ( #33005 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-26 06:24:43 -08:00
Cyrus Leung
dcd80206b7
[Chore] Update type annotation of input_ids in model forward ( #33063 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 06:02:10 -08:00
Alex Brooks
9ac818a551
[Misc] HF Hub LoRA Resolver ( #20320 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-26 13:56:32 +00:00
Cyrus Leung
11b556878b
[Refactor] Use data parser for matching data items to multi-modal UUIDs ( #32955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 15:00:28 +08:00
Robert Shaw
254db42ede
[Tests] Remove Duplicates ( #33032 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-26 05:23:54 +00:00
JJJYmmm
7e67df5570
[Bugfix] fix encoder cache hang in Qwen3VL ( #32684 )
...
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-25 05:17:31 +00:00
Roberto L. Castro
fcb9df99bd
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) ( #32520 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-01-24 18:45:27 -07:00
Joshua Deng
91601ff478
[Feature] add session based streaming input support to v1 ( #28973 )
...
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com >
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-24 12:06:28 -08:00
7. Sun
cd775bdbe0
[Tests] Replace flaky sleep with polling in test_background_cancel ( #32986 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 16:39:07 +00:00
7. Sun
0ccecf8833
[Tests] Standardize RNG seed utility across test files ( #32982 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:47:14 +00:00
7. Sun
0b9a735e11
[Tests] Clarify pytest skip reasons with actionable context ( #32981 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:38:50 +00:00
ElizaWszola
a28b94e6ef
[Performance] Split FlashAttn attention and cache update ( #25954 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 17:28:06 -08:00
dolpm
0118cdcc02
[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors ( #32912 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-23 22:53:10 +00:00
Michael Goin
4561f13985
[Refactor] Rename gptq_marlin to marlin to match MoE ( #32952 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-23 16:48:12 -05:00
Lucas Wilkinson
3a41459501
[cudagraphs] Refactor cudagraph capture loop ( #32946 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-23 13:22:20 -07:00
Harry Huang
5206e5e28c
[V1][Hybrid] Mamba Prefix Caching with align mode ( #30877 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2026-01-23 09:56:48 -08:00
Luka Govedič
bbbd696af9
[torch.compile][CI] Add back attn fusion on hopper/ada ( #32940 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 16:49:20 +00:00
sangbumlikeagod
9b77bb790d
[Frontend] add logprob, compression_rate to 'verbose_json' features ( #31059 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2026-01-23 16:35:13 +00:00
Matt
305e53ade8
[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test ( #32904 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-23 16:24:26 +00:00
Xin Yang
90c2007932
[Bugfix] Disable tma_aligned_scales in test_fusions_e2e ( #32916 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-23 14:34:30 +00:00
Fadi Arafeh
aac0b817fa
[CPU Backend][BugFix] Fix failing CPU MoE test ( #32876 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-23 12:06:51 +00:00