Commit Graph

14946 Commits

Author SHA1 Message Date
Yong Hoon Shin
de35c06c66 Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2026-03-17 21:29:06 +00:00
Athrael Soju
c0745a851a [Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-17 21:17:02 +00:00
Ekagra Ranjan
b5ca9c3557 [Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-17 21:04:17 +00:00
Chao-Ju Chen
245758992e [Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577)
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-03-17 20:48:42 +00:00
Dimitrios Bariamis
1204cf0a9d [Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
2026-03-17 20:13:06 +00:00
Wei Zhao
b36adfa349 [Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-03-17 20:09:20 +00:00
Michael Goin
e78821b438 [Deprecation] Deprecate --calculate-kv-scales option (#37201)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-03-17 19:57:24 +00:00
Cyrus Leung
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 19:44:52 +00:00
Brian Dellabetta
fa75204b16 bump compressed-tensors version to 0.14.0.1 (#36988)
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
2026-03-17 15:36:19 -04:00
Wentao Ye
bdb903bb5f [Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs (#36674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-17 15:19:52 -04:00
Andrey Talman
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
2026-03-17 18:47:59 +00:00
Avinash Singh
c5030c439d [CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Signed-off-by: Avinash Singh  <107198269+avinashsingh77@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-17 11:44:55 -07:00
Michael Goin
51b2333be1 [Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37225)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-03-17 11:35:17 -07:00
Andreas Karatzas
4ed51308c8 [CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-17 09:08:08 -07:00
Cyrus Leung
c781fbbab3 [Bugfix] Standardize custom HF Processor init (#37289)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 15:38:55 +00:00
Richard Zou
979ff44cea [BugFix] PyTorch Compilation Tests should error if any test fails (#37300)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-17 15:26:38 +00:00
Benjamin Chislett
f63ed7b5ac [Bugfix] Fix DP MTP Dummy Run (#35243)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-17 11:16:48 -04:00
Ning Xie
c9e5096256 [openapi] remove redundant exception stack trace[4/N] (#37157)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-03-17 15:06:25 +00:00
Anton Vlasjuk
2ff0ad9694 [UltraVox] Fix output type (#37224)
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:51:17 +00:00
Isotr0py
a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Bhoomit
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:36:41 +00:00
Harry Mellor
ecfcdd2ce4 Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:29:24 +00:00
Siew's Capital Jarvis
c25dbc2d27 [Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace (#36955)
Signed-off-by: Jarvis <brayden.stanley.0127@gmail.com>
2026-03-17 14:22:09 +00:00
Jonas M. Kübler
77d2a5f17b pick up tuned prefill configs for FP8 FA3 (#36265)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
2026-03-17 07:00:26 -07:00
Sage
59192dfd39 [Frontend] Complete OpenAI render delegation (#37287)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-17 13:53:55 +00:00
Umut Polat
56cb1baa66 [Misc] Use VLLMValidationError in batch, pooling, and tokenize protocol validators (#36256)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-03-17 13:52:30 +00:00
Cyrus Leung
f340324335 [1/2] Move InternVL-based processors (#37260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 21:50:56 +08:00
sfbemerk
2660b9289c Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
2026-03-17 21:22:09 +08:00
Viacheslav
293f036e6d Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2026-03-17 12:03:20 +00:00
youkaichao
0fb142a454 [perf][connector] optimize build_connector_meta when host buffer transfer is not used (#37165)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2026-03-17 11:59:35 +00:00
Sage
00f8e0d211 [Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-17 11:22:54 +00:00
zhao, zhenhui
4af9ed21cb [Bugfix](xpu): prevent “selected index k out of range” in TP decode path (#37259)
Signed-off-by: zhenzhao <zhenzhao@habana.ai>
2026-03-17 11:14:07 +00:00
Augusto Yao
9c7cab5ebb [Feature]: Support for multiple embedding types in a single inference call (#35829)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2026-03-17 17:05:42 +08:00
Chauncey
132bfd45b6 [Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-17 08:54:52 +00:00
xiao-llm
24b4272a8c Fix infinite recursive search issue in quark.py (#32779)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com>
Signed-off-by: kimheesu <wlskaka4@gmail.com>
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com>
2026-03-17 07:19:15 +00:00
Benjamin Chislett
8a680463fa [Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447) 2026-03-17 07:07:33 +01:00
Nick Cao
20b14095a4 [Bugfix] Fix loading Music Flamingo (#35535)
Signed-off-by: Nick Cao <ncao@redhat.com>
2026-03-17 05:24:40 +00:00
PatchyTIS
17c1bdf371 [Bugfix] dtype mismatch in ngram gpu propose (#37246)
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
2026-03-17 05:19:55 +00:00
Flora Feng
3e3d320c1b [Refactor] Relocate responses API tests (#37241)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 05:14:52 +00:00
Andreas Karatzas
54a62a79f7 [ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
v0.17.2rc0
2026-03-17 11:34:49 +08:00
Flora Feng
384dc7f77b [Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 11:31:23 +08:00
Flora Feng
f04d5226f8 [CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 03:24:34 +00:00
Kyuyeun Kim
0a0a1a198b Add ability to replace oot ops when using lora (#37181)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2026-03-16 18:04:15 -07:00
Vadim Gimpelson
6c1cfbad32 Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
2026-03-16 17:48:42 -07:00
Harry Huang
45f526d652 [BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-03-17 00:38:52 +00:00
Julien Denize
5db91f0aaf Fix some Mistral parser issues (#37209)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2026-03-17 00:08:56 +00:00
Walter Beller-Morales
061980c36a [Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-03-16 19:55:53 -04:00
Ben Browning
7a49742b88 [CI/Build] Add common tool call parser test suite (#27599)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2026-03-16 19:46:20 -04:00
Terry Gao
3e6a1e1686 [Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
2026-03-16 18:51:46 -04:00
Julien Denize
7961486a9b Fix EagleMistralLarge3Model initialization (#37232)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2026-03-16 15:41:00 -07:00