Commit Graph

9263 Commits

Author SHA1 Message Date
Ning Xie
5a30bd10d8 [Bugfix] fix IntermediateTensors equal method (#23027)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-18 02:58:11 -07:00
Cyrus Leung
27e8d1ea3e [Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-18 09:52:00 +00:00
Kunshang Ji
5c79b0d648 [XPU][CI]add xpu env vars in CI scripts (#22946)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-18 09:47:03 +00:00
Kunshang Ji
5f5664b3e4 [XPU] Fix compile size for xpu (#23069)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-18 00:04:08 -07:00
Roger Wang
89657a557c [Misc] Fix backward compatibility from #23030 (#23070)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-17 23:33:29 -07:00
Ning Xie
08d5f7113a [Misc] refactor function name (#23029)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-17 22:16:21 -07:00
Andy Lo
b2fd0b81e0 [Bugfix][CI] Machete kernels: deterministic ordering for more cache hits (#23055)
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-08-17 22:10:26 -07:00
double7
9f1c642254 [Bugfix] fix Qwen2.5-Omni processor output mapping (#23058)
Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com>
Co-authored-by: 杨森 <yangsen.double7@bytedance.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-17 22:09:11 -07:00
Ning Xie
7be3a59d8e [Misc] enhance static type hint (#23059)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-17 22:09:08 -07:00
Woosuk Kwon
8ea0c2753a [Misc] Minor code cleanup for _get_prompt_logprobs_dict (#23064)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 18:16:03 -07:00
Simon Mo
0fc8fa751a fix: gptq marlin weight loading failure (#23066)
Some checks failed
Create Release / Create Release (push) Has been cancelled
v0.10.1rc1
2025-08-17 15:56:07 -07:00
Calvin Chen
21e39436c8 [XPU] fix xpu to set cudagraph batch sizes (#23044)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
2025-08-17 21:45:42 +00:00
Woosuk Kwon
6d243efeda [Misc] Convert use_structured_output property into constant (#23060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 12:41:38 -07:00
Woosuk Kwon
c55bc1db26 [Misc] Remove dead return (#23061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 10:36:46 -07:00
Lucas Wilkinson
292084e72a [BugFix] Fix for IMA in FA3 varlen combine (#22967)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-17 08:52:04 -07:00
Kevinzz
16bff144be [Misc] fix typo in the multimodal doc (#23051) 2025-08-17 01:56:20 -07:00
947132885
fe0411fc6f [Bugfix] should use stack instead of concat (#22972)
Signed-off-by: 947132885 <947132885@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-17 08:46:36 +00:00
Jee Jee Li
4d4061b6e7 [Kernel] Add cuda kernel for gpt_oss activation (#22951)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-17 05:03:24 +00:00
Ning Xie
87f48623a5 [Misc] method name typo fix (#23042)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-16 21:49:14 -07:00
Cyrus Leung
5c32143b9d [Refactor] Defer tensor data construction in MultiModalKwargs (#23030)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-16 21:05:50 -07:00
Michael Goin
94096a47c9 [UX] Separate marlin moe config logic from triton moe (#23006) 2025-08-16 22:16:42 -04:00
Jinzhen Lin
a258ad8bcc [Bugfix] fix qwen3 moe fp8 accuracy issue (#23031)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
2025-08-16 17:41:23 -07:00
afeldman-nm
bf7f470b22 [V1] Logits processors extensibility (#19912)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-16 12:59:17 -07:00
Michael Goin
4fc722eca4 [Kernel/Quant] Remove AQLM (#22943)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-16 19:38:21 +00:00
Michael Goin
3253ae765e [Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions (#23028)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 18:33:08 +00:00
Michael Goin
000cceca8c [Bugfix gpt-oss] Fix float32 convert for flashinfer sink support (#23016)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 11:16:00 -07:00
Woonggi Min
68373d3126 [Frontend] Added support for HermesToolParser for models without special tokens (#16890)
Signed-off-by: minpeter <kali2005611@gmail.com>
2025-08-16 17:38:42 +00:00
Maximilien de Bayser
52ce1420e9 Fix handling of max_num_batched_tokens for pooling tasks (#23004)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-16 17:36:30 +00:00
汪志鹏
829bbd7882 [New Model]mBART model (#22883)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-08-16 12:16:58 +00:00
Cyrus Leung
4dff91c93d [Refactor] Allow optional MultiModalKwargsItem in IPC (#23022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-16 11:30:49 +00:00
Seiji Eicher
de9cb61763 Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput (#23012)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-16 10:21:20 +00:00
Isotr0py
2dbccce8a6 [CI][Bugfix] Skip Ovis2 generation test because of broken remote code (#22954)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 09:44:19 +00:00
Chengji Yao
933f45334a [Core] Make cudagraph check cuda platform only (#23005)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-08-16 07:46:00 +00:00
Isotr0py
cc826a202b [Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs (#22867)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 00:44:50 -07:00
Jee Jee Li
6d3da472bc [Misc] Add --save-dir option to benchmark_moe (#23020)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-16 07:26:10 +00:00
Andrew Sansom
78863f8c5c [BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors (#22962)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-08-16 06:25:10 +00:00
Lucas Wilkinson
5157827cfc [Build] Env var to disable sccache (#22968)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-16 05:36:27 +00:00
Kunshang Ji
7caec10e7b [XPU]avoid circular import during XPU init (#23017)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-16 05:16:34 +00:00
Grace Ho
1f83e7d849 [misc] nsys profile output kernel classifier and visualizer (#22971)
Signed-off-by: Grace Ho <grho@nvidia.com>
2025-08-16 02:52:51 +00:00
Calvin Chen
e4e37ded56 [V1] support min_tokens for detokener (#22014)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-08-16 02:28:10 +00:00
Nick Hill
f6b5040590 [Frontend] Avoid list copies in serving_chat.py (#22947)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-16 02:06:30 +00:00
Benjamin Chislett
fbd88728b3 [Bugfix] Fix DeepSeek MTP (#22934)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-08-16 01:25:06 +00:00
Nicolò Lucchesi
070da660c1 [Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit (#22735)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-16 00:14:08 +00:00
Nick Hill
ad0297d113 [Misc] Support passing multiple request ids at once to AsyncLLM.abort() (#22944)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-15 17:00:36 -07:00
Yichen Yan
236b864e4f [BugFix] Make run_once thread-safe (#22978)
Signed-off-by: <wenji.yyc@alibaba-inc.com>
Signed-off-by: Yichen Yan <wenji.yyc@alibaba-inc.com>
2025-08-15 16:56:17 -07:00
Yong Hoon Shin
3e2f7985a2 Support multiple attention groups for KV sharing (#22672)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-15 16:54:10 -07:00
Or Ozeri
c280066f9d [v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-08-15 16:52:52 -07:00
Nick Hill
b9dc9d2607 [BugFix] Handle case where async utility call is cancelled (#22996)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
2025-08-15 17:38:42 -06:00
rishitdholakia13
1fc375dc05 [Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits (#22963)
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
2025-08-15 23:25:05 +00:00
Eli Uriegas
76144adf76 ci: Add CUDA + arm64 release builds (#21201)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
2025-08-15 23:16:23 +00:00