Nick Hill
439368496d
[BugFix] Fix PP/async scheduling with pooling models ( #28899 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-18 00:20:45 -08:00
Isotr0py
896e41ae04
[CI/Build] Replace wikipedia url with local server ones ( #28908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-18 08:10:55 +00:00
Kuntai Du
5bb1da5190
[MISC] Remove format.sh ( #28906 )
...
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-11-18 05:28:31 +00:00
Nick Hill
5bdd155277
[CI] Fix async scheduling + spec decoding test flake ( #28902 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-18 05:26:32 +00:00
Ning Xie
0168f69e50
[Misc] Remove unnecessary parentheses from log statements ( #28897 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-17 20:33:46 -08:00
Didier Durand
083cf326dc
[Doc]: fix typos in various files ( #28863 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-17 20:32:14 -08:00
Cyrus Leung
bf9e1e8767
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields ( #28872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-17 20:30:29 -08:00
Wentao Ye
3ddcf46011
[Refactor] Remove Unused Func in Batch Invariant ( #28881 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-17 20:29:29 -08:00
xuebwang-amd
d0a73620cc
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss ( #28638 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 11:16:45 +08:00
Michael Goin
88ab591f0b
Run macos smoke test workflow on main commit ( #28752 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-18 11:16:03 +08:00
Benjamin Bartels
b6e04390d3
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing ( #28831 )
...
Signed-off-by: Thomas Mao <yiyeguhu@gmail.com >
Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Thomas Mao <yiyeguhu@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-17 19:13:25 -08:00
Zhuohan Li
552cac95b5
[Misc] Fix wrong comment in scheduler ( #28880 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-17 15:32:22 -08:00
Bangsheng Tang
61485844fc
[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 ( #28774 )
2025-11-17 15:22:11 -08:00
Pranav
f77bce001a
[Model] Add Afmoe architecture implementation ( #28332 )
...
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr >
Signed-off-by: Pranav <veldurthipranav@gmail.com >
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr >
2025-11-17 15:11:20 -08:00
Wentao Ye
a289cc1dde
[Test] Batch Invariant: Rename and organize tests ( #27421 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-17 18:09:47 -05:00
Shreyas Kulkarni
95ae50b7d1
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle ( #28435 )
...
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com >
2025-11-17 15:01:34 -08:00
Nick Hill
7765e5ba75
[BugFix] Fix PP performance and PP kv connector output regression ( #28768 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-17 14:08:50 -08:00
Ronald
d8874c61a5
[Core] Async Scheduling X Spec Decoding Compatibility ( #24799 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-11-17 12:16:20 -08:00
Zhewen Li
f8b19c0ffd
[Bugfix] Fix GPT-OSS on AMD after #28603 ( #28816 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-17 13:15:26 -05:00
tiehexue
e42bd8c2e3
Cast return value to int64_t for cache size ( #28814 )
...
Signed-off-by: tiehexue <tiehexue@hotmail.com >
2025-11-17 16:02:32 +00:00
Roger Wang
7f064491f8
[Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models ( #28858 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-17 14:49:25 +00:00
Lucas Wilkinson
64e39d667c
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg ( #28315 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-17 09:41:22 -05:00
Kunshang Ji
1b82fb0ad3
[XPU] work around for sp, avoid custom op import error ( #28822 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-17 13:16:44 +00:00
Jae-Won Chung
d4acf518d0
[Metrics] Fix KV cache usage percent metric multiproc ( #28792 )
...
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning
```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```
The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: Jae-Won Chung <jwnchung@umich.edu >
2025-11-17 09:54:15 +00:00
wuyaoxuehun
ab01cd14e5
[BugFix] Fix glm4_moe_mtp load weights bug ( #28805 )
...
Signed-off-by: wuyaoxuehun <798143193@qq.com >
2025-11-17 17:13:11 +08:00
Li, Jiang
577bb34fff
[CPU][Bugfix] Fix _to_list in CPU model runner ( #28824 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-17 07:47:24 +00:00
Jee Jee Li
3380ed5e11
[Doc] Add llama4 LoRA tag ( #28825 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-17 14:08:48 +08:00
Jay Caldwell
6f37419244
[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode ( #28543 )
...
Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com >
2025-11-17 13:54:46 +08:00
Xiake Sun
60e089f0b9
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 ( #28670 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-11-16 20:52:11 -08:00
liuzhenwei
d64429bb36
[NIXL][XPU] update install script of NIXL ( #28778 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-17 03:01:33 +00:00
jiahanc
561253b37f
[Performance][Fix] update nvfp4 code to support renorm routing ( #28569 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-16 18:02:42 -08:00
Nick Hill
80b6080ddc
[BugFix] Fix async scheduling + chunked prefill + preemption ( #28787 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-17 06:46:46 +08:00
amirkl94
03ee48111d
Feature: Support Relu2 in FusedMoE fp8 cutlass path ( #27261 )
2025-11-16 13:39:44 -05:00
Lukas Geiger
5a87076d6e
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation ( #28769 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-16 17:37:15 +00:00
Ning Xie
ac1daf3233
fix comment typo ( #28802 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-16 17:03:21 +00:00
Didier Durand
63fed55506
[Doc]: fix typos in various files ( #28811 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-16 14:30:06 +00:00
Anna Shors
8d259fad6c
Fix gpt oss weight loading with EP + bf16 ( #28765 )
...
Signed-off-by: ashors1 <ashors@nvidia.com >
2025-11-16 13:12:45 +00:00
scottzh8
3bc1175798
[Bugfix] Fix host and port join for ipv6 in bench serve ( #28679 )
...
Signed-off-by: Scott Zhang <scottzh@fb.com >
Co-authored-by: Scott Zhang <scottzh@fb.com >
2025-11-16 10:20:57 +00:00
Dezhan
af02c40970
Fixed gpt-oss _load_weights_other() parameter position bug ( #28715 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com >
2025-11-16 09:46:29 +00:00
Lucia Fang
b316ac6589
[V1] Support MP Executor for multi node distributed inference ( #23691 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-16 09:01:21 +00:00
wang.yuqi
a55b64635c
[Model] Allow users to control skip reading cache per request. ( #28194 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-16 00:04:50 -08:00
ai-jz
d231876ce3
[Benchmark] Fix client seed synchronization in multi-turn benchmark ( #28512 )
...
Signed-off-by: ai-jz <aijz.xplr@gmail.com >
2025-11-16 15:04:32 +08:00
Bram Wasti
f849ee739c
Adding a benchmark for batch invariance ( #28161 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-16 13:22:17 +08:00
Lucas Wilkinson
be263f7645
[BugFix] Fix AssertionError: DCP not support reorder_batch_threshold > 1 now. ( #28751 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-15 22:35:06 +00:00
Didier Durand
2bb4435cb7
[Doc]: fix typos in various files ( #28567 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-15 19:27:50 +00:00
Lukas Geiger
07cadab27a
[Model][Qwen3VL] Cache positional embedding indices ( #28475 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-15 19:03:09 +00:00
Nick Hill
637f292196
[CI] Fix broken pipeline ( #28781 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-15 08:44:14 -08:00
Eldar Kurtić
e439c784fa
Add support for Eagle with separate lm-head and embed_tokens layers ( #28549 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
2025-11-15 06:12:02 -08:00
hwhaokun
085a525332
[Model] Fix lmhead init bug of bailing_moe ( #28777 )
...
Signed-off-by: hwhaokun <haokun0405@163.com >
Co-authored-by: zhaozx-cn <zhaozx2116@163.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-15 05:44:12 -08:00
Cyrus Leung
89d3679221
[Doc] Fix failing doc build ( #28772 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-15 05:33:27 -08:00
tingtinggithub
cb15ee28db
Allow Gemma3 to take image embeddings ( #28483 )
...
Signed-off-by: tingtinggithub <streamttt@gmail.com >
2025-11-15 04:18:08 -08:00
Angela Yi
f36292dbee
[compile] Enable sequence parallelism matching w/o custom ops enabled ( #27126 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
2025-11-15 11:46:12 +00:00
Vadim Gimpelson
173b356abf
[PERF] Remove TRTLLM Gen attn kernel limitation max_seq_len <=131072 ( #28755 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-15 15:43:41 +05:30
Cyrus Leung
638e4196d1
[Misc] Make SchedulerConfig.max_model_len init-only ( #28733 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-15 01:59:31 -08:00
Zhewen Li
1ec978c209
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 ( #28709 )
...
Signed-off-by: Zhewen Li <zhewenli@meta.com >
2025-11-15 01:10:48 -08:00
Jane (Yuan) Xu
74b5267d3a
Use narrow over indexing in hadacore_transform to prep for ABI stable ( #28756 )
...
Signed-off-by: Jane Xu <janeyx@meta.com >
2025-11-15 01:10:15 -08:00
Zhuohan Li
dd6ac1c2bb
[RL] [V1] Remove unused device argument from reset_kv_cache ( #28766 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-14 23:59:42 -08:00
Cyrus Leung
98b4d389ed
[Redo] #26368 ( #28771 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 22:47:41 -08:00
Varun Sundar Rabindranath
6965ef436f
[Performance][DeepGEMM] Estimate expected_m ( #28694 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-15 13:52:14 +08:00
Chendi.Xue
c9e665852a
[NIXL] heterogeneous block_size support ( #26759 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-11-14 21:51:32 -08:00
Mohammad Othman
363aaeef0f
Fix IntermediateTensors initialization and add type hints ( #28743 )
...
Signed-off-by: Mohammad Othman <Mo@MohammadOthman.com >
Co-authored-by: Mohammad Othman <Mo@MohammadOthman.com >
2025-11-15 04:31:36 +00:00
Nick Hill
ac86bff8cb
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… ( #28773 )
2025-11-14 20:24:00 -08:00
Michael Goin
edfe498189
[Bugfix] Build hadacore kernels on >SM90 ( #28748 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 19:51:05 -08:00
Lukas Geiger
f05d474c8a
[Model][Qwen3VL] Use mm_position to compute mrope positions ( #28730 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 19:45:11 -08:00
QiliangCui
9fc81ec765
[TPU] Fix import error in tpu launch ( #28758 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-15 00:58:32 +00:00
Jialin Ouyang
186352b270
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization ( #26368 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 16:04:04 -08:00
Nick Hill
58e61e56b7
[Test] Rework e2e async scheduling tests ( #28744 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-14 16:01:09 -08:00
Gregory Shtrasberg
75f01b9d3c
[ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main ( #28753 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-14 15:53:21 -08:00
rasmith
ba041d980b
[Log] Save profiler results to file instead of stdout ( #28144 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-14 23:26:39 +00:00
Thomas Parnell
e0c910bb89
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 ( #28295 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-14 22:55:42 +00:00
Benjamin Chislett
bf3ffb61e6
[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting ( #28739 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-14 14:14:46 -08:00
Alexander Matveev
e5c78956c0
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine ( #28740 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-14 14:13:46 -08:00
Laith Sakka
2e0ad629b0
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch ( #25110 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-11-14 14:11:10 -08:00
Gregory Shtrasberg
5a84b76b86
[ROCm][CI/Build] Change install location of uv ( #28741 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-14 21:34:18 +00:00
Marcin Ostrowski
0de4f217ab
[Bugfix] TypeError: 'NoneType' object is not callable ( #27410 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com >
2025-11-14 21:13:53 +00:00
Michael Goin
f08eab2acc
[CI] Fix macos smoke test uv cache issue ( #28736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 13:29:55 -07:00
Sage Moore
8977ffb5e6
[ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu ( #28682 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-14 11:06:01 -08:00
Andrey Khalyavin
fd4555089a
[BugFix] Fix misprint introduced by modular_kernel refactoring. ( #28728 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru >
2025-11-14 10:58:18 -08:00
GuanH
cec275efce
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure ( #28663 )
...
Signed-off-by: GuanH <guansdrailib@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-14 18:44:27 +00:00
Cyrus Leung
e2741f6cbc
[Chore] Rename SchedulerConfig.chunked_prefill_enabled ( #28735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 18:39:57 +00:00
Harry Mellor
67187554dd
[Docs] Enable some more markdown lint rules for the docs ( #28731 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 18:39:19 +00:00
TJian
a425dc256e
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo ( #28716 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-14 10:30:50 -08:00
Fardin Hoque
964d65deed
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wei Wei <wwei6@meta.com >
2025-11-14 13:27:56 -05:00
Chen Wang
9261eb3dc1
docs(lora_resolvers): clarify multi-resolver order and storage path requirement ( #28153 )
...
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 18:08:30 +00:00
czhu-cohere
cdd7025961
[kernel] Improve FP8 PTPC on Hopper for larger shapes ( #28692 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com >
2025-11-14 09:59:11 -08:00
Julien Denize
085424808e
Remove audio optional dependency for mistral-common ( #28722 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 09:54:38 -08:00
Mohammad Othman
a17e36f223
Fix typo in comment: existance -> existence ( #28737 )
...
Signed-off-by: Mohammad Othman <emranm226@hotmail.com >
2025-11-14 09:35:45 -08:00
Matthew Bonanni
8cc40f8992
[Attention] Bump FA for removed method ( #28429 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 09:13:37 -08:00
Nicolò Lucchesi
6f1e7f7226
[DisaggEverything] Tokens in<>out /generate endpoint ( #24261 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 09:58:01 -07:00
Michael Goin
d54a18a47e
[CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner ( #28688 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:37:18 -07:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend ( #28725 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 16:34:14 +00:00
dongbo910220
c934caee88
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL ( #28711 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-11-14 16:07:20 +00:00
Duncan Moss
3f8a874065
[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) ( #27134 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-14 08:02:44 -08:00
Cyrus Leung
511a6b611d
[Config] Clean up SchedulerConfig initialization ( #28665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 22:41:02 +08:00
Nicolò Lucchesi
96b23b8e3b
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue ( #28677 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-14 22:40:05 +08:00
zhaozx-cn
433c0f8675
[Model] Fix bailing_moe accuracy problem ( #28277 )
...
Signed-off-by: zhaozx-cn <zhaozx2116@163.com >
2025-11-14 13:33:02 +00:00
Fasal Shah
8d3748d3c7
[Doc] Fix macOS installation dependency resolution issue ( #26721 )
...
Signed-off-by: faisal shah <fashah@redhat.com >
2025-11-14 12:43:56 +00:00
Lucas Wilkinson
db56a59970
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) ( #28702 )
2025-11-14 12:19:22 +00:00
Yong Hoon Shin
9324e10275
Fix KV sharing fast prefill with cudagraph enabled ( #28537 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 11:53:42 +00:00
Jingchun Gao
4516d44b7f
[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer ( #25438 )
...
Signed-off-by: gaojc <1055866782@qq.com >
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com >
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com >
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-14 11:24:10 +00:00
Shanshan Shen
41b92f7d38
[Model][MM] Extract conv layer as CustomOp ( #28455 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-14 19:16:13 +08:00
Srreyansh Sethi
360bd8762f
[Frontend] Added chat-style multimodal support to /classify. ( #27516 )
...
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Signed-off-by: vnadathur <glvikramn@gmail.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-14 11:03:55 +00:00
lyn610
ecf8230d4d
[Metrics] Log number of preempted requests ( #28522 )
...
Add tracking and periodic logging for the number of preempted requests in the
metrics logger. This helps monitor system behavior under load.
Signed-off-by: Yining Liu <610lyn@gmail.com >
2025-11-14 09:47:45 +00:00
Xing Liu
8cfbe89b93
[Misc] fix comment in test_envs ( #28529 )
...
Signed-off-by: Xing Liu <xingliu14@gmail.com >
2025-11-14 09:32:46 +00:00
Boyuan Feng
fd75d3e8c0
[Minor] avoid register new custom and just import silly_attn ( #28578 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-14 09:32:31 +00:00
Michael Goin
c9a3a02149
Add output token counting to gsm8k eval ( #28594 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:32:03 +00:00
Nick Hill
bc3e43069a
[BugFix] Fix multi-modal async scheduling race condition ( #28706 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-14 01:11:13 -08:00
Jiangyun Zhu
c36bcfe6b3
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-14 09:01:26 +00:00
Yan Ma
529cea343d
use default CCL_ZE_IPC_EXCHANGE ( #28700 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-11-14 16:55:29 +08:00
rasmith
93103575ce
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate ( #28311 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-13 22:41:29 -08:00
rasmith
15ae8e0784
[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) ( #28432 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-11-13 22:34:01 -08:00
haoyangli-amd
0b25498990
[Misc] add ignore mapper for quark quantization ( #28275 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
2025-11-14 05:56:35 +00:00
Roger Wang
0aecd9138f
[Misc] Update xformers to 0.33.0.post1 ( #28678 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-13 21:52:53 -08:00
Kunshang Ji
da14ae0fad
[XPU][CI]disable lm cache uts ( #28696 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-14 03:15:50 +00:00
Cyrus Leung
01bea115c4
[Misc] Remove warn_for_unimplemented_methods ( #28613 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 11:10:10 +08:00
Bradley D
b39a5026eb
[ci][amd] fix basic models extra init test ( #28676 )
...
Signed-off-by: Bradley Davis <bradleyhd@meta.com >
2025-11-14 02:44:36 +00:00
Michael Goin
622e6106a9
[CPU][Bugfix] Fix Apple Silicon M1 compilation failure ( #28681 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:49:55 +08:00
Sage Moore
2aa75c752b
[ROCm] Bump up the version of amd-smi to 6.4.3 ( #28680 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-14 01:24:28 +00:00
Hank_
4d5943bda6
[quantization][config] enable override existing quant_config ( #28510 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-14 01:24:10 +00:00
Alexei-V-Ivanov-AMD
f2b8e1c551
Mirrored test group definitions for AMD (2025-11-11) ( #28573 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-14 00:16:34 +00:00
Mark McLoughlin
6e25b1cddf
[KV Connector] Test async mode in scheduler tests ( #28550 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-13 18:30:59 -05:00
Wentao Ye
e64011f29a
[CI] Bug: Fix ci entrypoint pooling ( #28684 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-13 14:19:35 -08:00
Simon Mo
1b622deba7
[Misc] Update CODEOWNERS for simon-mo and comaniac ( #28675 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
2025-11-13 21:01:43 +00:00
Kebe
faed7bf07e
[Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fault ( #27791 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-13 12:48:08 -08:00
Yanan Cao
262d263f6c
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning ( #28533 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-13 15:09:05 -05:00
Qiu
968060c15a
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context ( #28526 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-13 11:29:22 -08:00
elvischenv
5d6ce2b960
[Perf] Support stream interval for reducing host overhead ( #27869 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-13 13:21:25 -05:00
Matthew Bonanni
f9f3b596f3
[Attention][Bugfix] Fix FA sink support ( #28660 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-13 13:20:01 -05:00
Yannick Schnider
119c4927b3
[Bugfix] Fix validate model input for decoder models ( #27099 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-13 10:18:47 -08:00
Varun Sundar Rabindranath
fe1cd7704d
[Performance][B200] silu_mul_quant: pack scales in int32 ( #28358 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-13 10:16:55 -08:00
Johnny Yang
fdfd5075aa
[TPU] patch TPU wheel build script to resolve metadata issue ( #27279 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-11-13 09:36:54 -08:00
Nick Hill
327c0a9a23
[BugFix] Ensure EngineArgs.create_engine_config is idempotent ( #28515 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-13 17:14:08 +00:00
Jane (Yuan) Xu
06c4873d95
Rewrite C++ meta funcs to Python ( #28595 )
...
Signed-off-by: Jane Xu <janeyx@meta.com >
2025-11-14 00:52:50 +08:00
Roger Wang
d3387750f1
[Misc] Turn off encoder torch compile by default ( #28634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-13 08:38:08 -08:00
Harry Mellor
b230286fbc
Fix get_num_experts when config sets it explicitly to None ( #28652 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: bruceszchen <bruceszchen@tencent.com >
2025-11-13 16:02:42 +00:00
Yuanping Song
3035d1a166
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path ( #28617 )
...
Signed-off-by: Yuanping Song <yuanping.song@outlook.com >
2025-11-13 15:24:35 +00:00
Huamin Li
07a606aa7e
[CI Failure] Fix backend selection for encoder-only models ( #28534 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-13 10:11:27 -05:00
amdfaa
a7791eac9d
[CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %N ( #28142 )
...
Signed-off-by: amdfaa <107946068+amdfaa@users.noreply.github.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: zhewenli <zhewenli@meta.com >
2025-11-13 14:34:55 +00:00
Pleaplusone
8da2f28f53
[ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py ( #28618 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-13 14:18:20 +00:00
Akash kaothalkar
86d15bfd8d
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version ( #28535 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-11-13 13:32:21 +00:00
Fanli Lin
c9fe6abe7c
[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing ( #28630 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-13 13:06:06 +00:00
zofia
c47b6c85ac
[XPU] add sym params to IPEXConfig ( #28611 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2025-11-13 11:35:04 +00:00
baonudesifeizhai
c428e8d80b
Fix io processor pooling #28273 ( #28484 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-11-13 11:34:14 +00:00
Zijing Liu
5e973209aa
[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter ( #28603 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
2025-11-13 11:30:04 +00:00
Di Wu
e63fd44560
Fix: Correctly filter special tokens in benchmark_prefix_caching ( #28615 )
...
Signed-off-by: Di Wu <dw2761@nyu.edu >
2025-11-13 10:57:44 +00:00
Yong Hoon Shin
11ac9ddd03
Support all interleaved layer types ( #28485 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-13 08:57:20 +00:00
Chauncey
5c9ad138d5
[Frontend] supports interleaved thinking ( #28531 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-13 16:14:13 +08:00
Jiangyun Zhu
fa183e9271
[Bugfix] fix kimi-linear crash ( #28445 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-13 07:59:58 +00:00
usberkeley
4ab34f6ef1
Add NUMA node validation for CPU thread binding ( #28555 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-13 07:03:52 +00:00
Huy Do
c33b87e777
Use official xformers-0.0.33 built for PT 2.9 ( #28600 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-12 22:48:53 -08:00
tjandy98
4504e8029b
[Bugfix] Prevent crash on empty grammar string ( #28210 )
...
Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com >
2025-11-13 06:42:29 +00:00
Pleaplusone
ca00b1bfc6
[ROCm][BugFix] Remove the usage of device_info from aiter ( #28383 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-12 21:43:42 -08:00
Radu Salavat
d44fbbab0e
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds ( #28059 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com >
2025-11-13 05:43:08 +00:00
Lucia Fang
7e082bc14e
Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 ( #28574 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-11-12 21:40:45 -08:00
Fanli Lin
dbbe0c756a
[XPU] Support Triton path for LoRA operations on XPU ( #28511 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-11-13 05:31:42 +00:00
Pleaplusone
7dca0c90cb
[BugFix][ROCm] Fix get_cu_count missing variable error ( #28608 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-13 05:18:56 +00:00
Andrew Xia
1a0b157a2e
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format ( #28231 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-13 04:47:22 +00:00
Andrew Xia
7c38ed0f1c
[Frontend] split append tool output ( #28333 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-13 04:03:23 +00:00
Jialin Ouyang
a1d3866dda
[n-gen] DO NOT repeatedly return finished child requests ( #28591 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-13 03:36:07 +00:00
Harry Mellor
97d1c99302
Rename clashing method names for vLLM model protocol ( #27583 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 19:14:33 -08:00
Harry Mellor
3226283461
[Docs] Add some details about what the MoE block needs for the Transformers backend ( #28588 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-13 03:12:14 +00:00
Nick Hill
8832fff972
[BugFix] Fix mm_encoder_attn_backend arg type checking ( #28599 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-13 03:06:03 +00:00
Michael Goin
a543e678b4
[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support ( #28561 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-12 19:40:59 -07:00
wangxiyuan
2dacd57394
[platform] Move get_cu_count to utils ( #27005 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-13 08:48:47 +08:00
Gregory Shtrasberg
d75ad04818
[ROCm][Bugfix] Revert removing setuptools version restriction ( #28592 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-12 16:46:58 -08:00
Michael Goin
52eadcec9e
[Docs] Update meetups.md description ( #28583 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-13 00:00:23 +00:00
Harry Mellor
51c599f0ec
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 23:43:57 +00:00
Alexander Matveev
69d0e90313
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap ( #28406 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-12 23:37:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform ( #12695 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com >
2025-11-12 15:24:12 -08:00
Michael Goin
10f01d5a3a
[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX ( #28294 )
2025-11-12 15:14:13 -08:00
QiliangCui
3eb0c2673e
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR ( #28487 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-12 22:31:14 +00:00
vllmellm
d8140b9833
[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in _aiter_ops.py ( #28464 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath
74a9a9faad
[Performance][B200] Fix deepgemm prologue ( #27897 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-12 13:13:03 -08:00
Wei Wei
478ee511de
[Misc]Fix typo in llm_engine.py ( #28584 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-11-12 12:59:43 -08:00
Andy Lo
58ce8d12b7
[BugFix] Priority scheduling and spec tokens preemption ( #28558 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-12 20:29:21 +00:00
Yihua Cheng
94a9ebcf31
[KV connector][WIP] KV cache proxy based on LMCache multi-process mode ( #27902 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-11-12 20:25:43 +00:00
Harry Mellor
a39dd7bb06
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers ( #28559 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 19:38:13 +00:00
Thomas Parnell
64d57c3be7
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model ( #28563 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-12 18:17:55 +00:00
PerryZhang01
a1e7fa362a
[EPLB][ROCm]: support EPBL for ROCm backend ( #27731 )
...
Signed-off-by: Perry Zhang <perzhang@amd.com >
Co-authored-by: Perry Zhang <perzhang@amd.com >
2025-11-12 18:16:35 +00:00
alberto
bac904565f
Implement ARC KV cache eviction policy for CPU offloader ( #27039 )
...
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com >
Signed-off-by: alberto <aperdomo@redhat.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2025-11-12 09:51:39 -08:00
Benjamin Chislett
304419576a
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer ( #28479 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-13 01:56:40 +09:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig ( #27593 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 16:10:28 +00:00
Nicolò Lucchesi
728a9eb70e
[Misc] Refactor Attention kv transfer methods into decorator ( #27816 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-11-12 16:05:44 +00:00
Canlin Guo
bc5bd45c7d
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL ( #28271 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-12 15:56:47 +00:00
Alexander Matveev
f76e85c299
[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) ( #28492 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-12 10:51:43 -05:00
Harry Mellor
54aecd9ed5
Fix pre-commit (and XPU) on main ( #28556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 06:13:41 -08:00
wangxiyuan
10138c92a5
[V0 deprecation] Deprecate use_v1 parameter ( #28112 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-12 14:03:52 +00:00
Jee Jee Li
a9d18b5107
[Bugfix] Fix gpt_oss packed_modules_mapping ( #28536 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-12 21:02:06 +08:00
TJian
edb59a9470
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility ( #28500 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-12 05:01:14 -08:00
ZhengHongming888
c5f10cc139
add cpu option for p/d in nixl_connector ( #28356 )
...
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com >
2025-11-12 11:53:08 +00:00
ziruiliu
d143152308
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector ( #27978 )
...
Signed-off-by: Zirui Liu <ziliu@ddn.com >
Signed-off-by: ziruiliu <ziliu@ddn.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-12 11:44:58 +01:00
Chaojun Zhang
a4730c1b4f
[XPU]Fix crash due to removed VLLM_USE_V1 attribute ( #28520 )
...
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com >
2025-11-12 10:20:55 +00:00
wuyaoxuehun
d3ade61e42
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. ( #27597 )
...
Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com >
Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com >
2025-11-12 10:14:00 +00:00
yyzxw
1761dea1a8
[BugFix]: --enable-lora with model granite-4.0-micro crash ( #27733 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-11-12 09:03:56 +00:00
Huamin Li
c748355e0d
[CI] Introduce autorun_on_main feature ( #27836 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-12 08:51:19 +00:00
Chenguang Zheng
91864b79b3
[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD ( #28521 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-11 23:09:33 -08:00
Lukas Geiger
ac0bb2c307
[Core] Cache vllm_is_batch_invariant ( #28304 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-12 05:03:01 +00:00
ai-jz
f31419ed8b
[Benchmark] Add retry support to fix workload bias in multi-turn benchmark ( #28493 )
2025-11-12 05:00:45 +00:00
Fanli Lin
b9ce9a3013
[BugFix] Add fallback path in apply_rotary_pos_emb_flashattn for non-cuda platforms ( #28447 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-12 03:13:21 +00:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com >
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: herotai214 <herotai214@gmail.com >
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com >
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com >
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: herotai214 <herotai214@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com >
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com >
2025-11-11 18:58:33 -08:00
Lukas Geiger
cbb799e314
[Model][Qwen3VL] Simplify get_mrope_input_positions using numpy ( #28302 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-12 02:55:10 +00:00
Andreas Karatzas
9f0247cfa4
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com >
2025-11-11 18:34:36 -08:00
Li, Jiang
7f829be7d3
[CPU] Refactor CPU attention backend ( #27954 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-12 09:43:06 +08:00
wangxiyuan
e1710393c4
[[V0 deprecation]]Remove VLLM_USE_V1 env ( #28204 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-11 18:22:16 -07:00
Isotr0py
3f770f4427
[Performance] Cache loaded custom logitsprocs to avoid overheads ( #28462 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-11 16:49:29 -08:00
Yanan Cao
48c879369f
[Frontend] Change CompilationMode to a proper Enum ( #28165 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-11 19:46:18 -05:00
Ilya Markov
1788aa1efb
[BugFix] Graceful handling of torch symm mem errors. ( #27671 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-11 17:41:54 -07:00
Adrian Abeyta
d23539549a
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile ( #28491 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-11-12 00:34:58 +00:00
Max Hu
412e153df5
[Feature] Allow configuring FlashInfer workspace size ( #28269 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-11 23:32:20 +00:00
Michael Goin
e5f599d4d1
[Bugfix] Disable shared expert overlap if Marlin MoE is used ( #28410 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 23:16:12 +00:00
Michael Goin
28534b92b9
Add Zurich vLLM Meetup ( #28488 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 14:53:59 -08:00
wangxiyuan
d4902ba56d
[Misc] Cleanup Executor interface ( #28441 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-11 22:28:07 +00:00
Kyuyeun Kim
df4d3a44a8
[TPU] Rename path to tpu platform ( #28452 )
...
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com >
2025-11-11 19:16:47 +00:00
Jee Jee Li
9d1c474704
[LoRA][1/N]Remove LoRA extra vocab ( #28382 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-11 11:06:21 -08:00
Jie Luo
8c32c6e4b4
[Misc] fix typo in DCP comment ( #28389 )
...
Signed-off-by: Livinfly <luojie3m@gmail.com >
2025-11-11 10:59:16 -08:00
Canlin Guo
de120bc94f
[V0 deprecation] Clean up num_prefill_tokens logic for V0 ( #28203 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-11 10:57:12 -08:00
Jialin Ouyang
4228be7959
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead ( #28245 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-11 10:28:47 -08:00
Lukas Geiger
76e4dcf225
[Misc] Remove unused attention prefix prefill ops functions ( #26971 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-11 18:26:04 +00:00
Fanli Lin
d5edcb8678
[BugFix] Fix Siglip2Attention on XPU ( #28448 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 18:18:02 +00:00
Xin Yang
6c3c0f8235
[Kernel] Optimize rms_norm kernel ( #27931 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-11 18:02:23 +00:00
Matthew Bonanni
684f254585
Prefer FlashAttention MLA as default over FlashMLA ( #27363 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-11 17:13:51 +00:00
Zhewen Li
e553424919
[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA ( #28424 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-12 01:09:47 +08:00
xuebwang-amd
5a1271d83a
[Quantization] fix attention quantization of gpt_oss model ( #27334 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2025-11-11 12:06:00 -05:00
xuebwang-amd
05576df85c
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model ( #24239 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: fxmarty-amd <felmarty@amd.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-11 12:05:22 -05:00
zhrrr
68c09efc37
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model ( #27165 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2025-11-11 12:00:31 -05:00
Nicolò Lucchesi
a7ef3eb0cd
[NIXL] Generalize block-first backend layouts (FlashInfer-like) ( #28282 )
2025-11-11 16:57:43 +00:00
Michael Goin
f9a4087182
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel ( #28431 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 11:46:04 -05:00
the-codeboy
287bbbeb06
[Doc] Fix typo in serving docs ( #28474 )
...
Signed-off-by: the-codeboy <71213855+the-codeboy@users.noreply.github.com >
2025-11-11 16:45:49 +00:00
usberkeley
3143eb23fc
[BugFix] Add test_outputs.py to CI pipeline ( #28466 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-11 16:01:30 +00:00
Fanli Lin
b886068056
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU ( #28444 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 15:29:33 +00:00
Mark McLoughlin
a90ad7d838
Add @markmc to CODEOWNERS for Observability ( #28457 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-11 23:03:22 +08:00
jvlunteren
533b018f72
[BugFix] Fix Failing Ruff Check ( #28469 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com >
2025-11-11 06:41:43 -08:00
bnellnm
a1448b4b69
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code ( #28064 )
2025-11-11 07:29:02 -07:00
Maryam Tahhan
fa1970201d
[Docs] Fix grammar in CPU installation guide ( #28461 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2025-11-11 14:01:11 +00:00
Ido Segev
3380543b20
Add request timeout override for multi-turn benchmarks ( #28386 )
...
Signed-off-by: Ido Segev <idos@pliops.com >
2025-11-11 13:41:18 +00:00
Cyrus Leung
afffd3cc8a
[Model] Pass mm_features directly into get_mrope_input_positions ( #28399 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 21:14:48 +08:00
Chaojun Zhang
7dbe6d81d6
Fix Fused MoE LoRA Triton kernel bug ( #28450 )
...
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com >
2025-11-11 20:46:47 +08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-11 07:40:44 -05:00
Michael Goin
2e78150d24
[CI] Add mergify rules for nvidia label ( #28417 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 04:28:28 -08:00
Ido Segev
d381eb967f
Multi turn benchmark progress bar for synthetic conversation generation ( #28394 )
...
Signed-off-by: Ido Segev <idos@pliops.com >
2025-11-11 11:06:04 +00:00
Lukas Geiger
9973e6e04a
[Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate ( #28434 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-11 10:35:10 +00:00
Fanli Lin
c7991269dd
[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` ( #28387 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 08:45:38 +00:00
Jiangyun Zhu
f0359fffa4
[Bugfix] fix qwen3-next crash ( #28202 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-11 08:24:28 +00:00
Sage Moore
798c7bebca
[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU transfers in EPLB ( #28369 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-11 00:19:51 -08:00
Roger Wang
4fd4b743a2
[Bugfix] Fix max image size for PaddleOCR-VL ( #28442 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-11 08:07:24 +00:00
David Ben-David
cc079763c5
[BugFix] Avoid calling KV connector layer APIs when metadata is unset ( #28253 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 23:39:36 -08:00
iAmir97
a7adbc6c6b
[Doc] Sleep mode documentation ( #28357 )
...
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 22:44:35 -08:00
Robert Shaw
e605e8e323
[Bugfix] Fix Stream Sync for Shared Expert Overlap ( #28430 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-11 05:59:08 +00:00
Zuyi Zhao
bca74e32b7
[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server ( #27892 )
...
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com >
Signed-off-by: Shen Teng <sheteng@amazon.com >
Co-authored-by: Shen Teng <sheteng@amazon.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-11 04:57:01 +00:00
Zhuohan Li
8d706cca90
[Misc] FlattenLogprobs -> FlatLogprobs ( #28335 )
2025-11-11 03:41:23 +00:00
Xin Yang
57201a6a4c
Fix rotary embedding benchmark script ( #28323 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-10 21:57:12 -05:00
Michael Goin
f2d9ad0620
Only register rocm_aiter_ops if aiter is found ( #28428 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 02:53:24 +00:00
Wentao Ye
de540c0354
[Feature] Add env var VLLM_MOE_USE_DEEP_GEMM ( #28422 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-11 02:29:48 +00:00
Lucas Wilkinson
39029d5192
[CI/Test Fix] Fix CP tests on Blackwell ( #28404 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-11 01:36:29 +00:00
Wentao Ye
35d801f13f
[Feature] Refactor batch invariant fp8 DeepGEMM ( #27606 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-11 00:08:40 +00:00
Matthew Bonanni
0bf29fadf5
[Test] Remove old non-varlen FA2 test ( #28420 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-10 23:57:41 +00:00
Adrian Abeyta
a5a790eea6
[Bugfix] Ensure calculated KV scales are applied in attention. ( #27232 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-11-10 23:42:37 +00:00
Jialin Ouyang
b30372cbd0
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage ( #27896 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-10 15:34:18 -08:00
Ilya Markov
d17ecc6b19
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds ( #24248 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-10 18:33:11 -05:00
Yong Hoon Shin
021143561f
[ROCm] Add missing gemm_a8w8_blockscale import ( #28378 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-10 23:13:36 +00:00
Robert Shaw
30700b1cd7
[CI] Fix Plugin Tests Tests ( #28413 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2025-11-10 22:36:11 +00:00
Andrew Xia
4b94ed8f92
[Frontend][2/n] remove empty content from _parse_tool_calls_from_content ( #28331 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-10 14:07:49 -08:00
Lucas Wilkinson
6dec9f6109
[BugFix] Fix DeepGEMM over-allocating workspace ( #28254 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 17:01:17 -05:00
Wei Wei
bf6a3d0ff5
[Misc] Add more scoping for improved trace ( #28329 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-11-10 21:03:21 +00:00
Sage Moore
40d33264c6
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled ( #28377 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Sage Moore <sagemoore@utexas.edu >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-10 20:39:19 +00:00
Jonas M. Kübler
9c84ca8293
[FA/Chore] Bump FA version for FP8 two-level accumulation ( #27889 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-10 12:06:04 -08:00
Rémi Delacourt
6d54336ae5
[Bugfix] Fix llguidance backend, rollback when EOS was encountered ( #25905 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-10 14:53:32 -05:00
jiahanc
34553b9d27
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next ( #27492 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath
b039bfda8f
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests ( #28366 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 09:21:52 -08:00
Cyrus Leung
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 00:30:06 +08:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-10 08:20:53 -08:00
caozuoba
40e2eeeb92
[Kernel] Optimization of the mm_k operator. ( #28280 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-10 16:03:46 +00:00
zejunchen-zejun
b06b9470ca
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model ( #27474 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-11-10 10:38:56 -05:00
TJian
4673e465ff
Add @tjtanaa to codeowner for ROCm and multi-modal ( #28360 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-10 21:39:17 +08:00
Ferrebo
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com >
Signed-off-by: kebo01 <kebo01@baidu.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
Yu Jiaqi
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-11-10 21:21:15 +08:00
Mark McLoughlin
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 16:34:36 +08:00
Shinichi Hemmi
a98cc35c34
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 ( #28019 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-11-10 06:50:02 +00:00
Lucas Wilkinson
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 14:32:09 +08:00
Xiake Sun
03fa4d3fb3
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X ( #28373 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
Signed-off-by: Xiake Sun <xisun@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 10:45:29 +08:00
JartX
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-11-09 23:09:36 +00:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-09 14:26:35 -07:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-09 21:08:38 +00:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-09 19:11:46 +00:00
Lucas Wilkinson
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-09 13:51:43 -05:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-09 16:04:59 +00:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-08 23:59:24 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-08 22:27:00 -08:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-09 05:33:46 +00:00
yihong
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-11-09 05:07:26 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2025-11-08 15:50:10 -07:00
ElizaWszola
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-08 14:31:33 -08:00
Cole Murray
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com >
2025-11-08 22:30:18 +00:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-08 14:07:37 -08:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com >
2025-11-08 22:04:16 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-08 19:44:25 +00:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-08 19:01:11 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-08 12:15:08 +00:00
22quinn
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-07 22:33:27 -08:00
Xiaozhu Meng
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com >
2025-11-08 14:17:35 +08:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com >
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
youkaichao
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-11-08 00:26:34 +08:00
Fadi Arafeh
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-07 15:41:47 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-07 15:07:01 +00:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
Wentao Ye
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-07 07:35:33 -05:00
Iceber Gu
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-11-07 12:21:40 +00:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-11-07 04:18:39 -08:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-11-07 19:38:38 +08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-07 11:03:57 +00:00
汪志鹏
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-11-07 09:35:22 +00:00
Jialin Ouyang
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead ( #28171 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-07 00:27:12 -08:00
Jee Jee Li
21b82f4ea2
[Kernel] LoRA triton kernels support PDL ( #27402 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-07 08:05:48 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00
baonudesifeizhai
9da9208b20
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 ( #28256 )
2025-11-07 07:31:58 +00:00
smit kadvani
11fd69dd54
[amd][gptoss] Perf gain because of block alignment ( #28024 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com >
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com >
2025-11-07 05:27:42 +00:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 04:23:17 +00:00
Alexis MacAskill
a47d94f18c
Add runai model streamer e2e test for GCS ( #28079 )
...
Signed-off-by: Alexis MacAskill <amacaskill@google.com >
2025-11-07 03:07:54 +00:00
Alex Brooks
e70fbc599b
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) ( #28247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Signed-off-by: Alex Brooks <alex.brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-07 02:51:27 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-07 00:16:03 +00:00
Junhong Liu
59b453eaa2
Speed up mm processor kwargs per request by spliting dynamic and static kwargs ( #26483 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
2025-11-07 07:51:28 +08:00
Eugene Khvedchenya
827e4237bc
Fix failing test for CRadio ( #27738 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com >
2025-11-06 15:32:25 -08:00
Varun Sundar Rabindranath
ca6f755d24
[BugFix] Fix FusedMoELoRA + ModularKernel Integration ( #28237 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-06 22:53:30 +00:00
Matthew Bonanni
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-06 20:59:57 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com >
2025-11-06 20:27:16 +00:00
Aleksandr Malyshev
449de9001a
[ROCm] triton fp8 kernel ( #27058 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2025-11-06 14:46:44 -05:00
Vico Chu
d4aa65c998
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api ( #27792 )
...
Signed-off-by: Vico Chu <vico24826@gmail.com >
2025-11-06 19:09:19 +00:00
Julien Denize
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 18:55:17 +00:00
Andy Lo
5e0c1fe69c
[Structured outputs] Upgrade llguidance to 1.3.0 ( #28039 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 10:24:47 -08:00
Russell Bryant
4507a6dae4
CODEOWNERS: Add myself as reviewer on security docs ( #28216 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 17:39:42 +00:00
Roy Wang
d1dd5f53e4
[Frontend] Fix logging format when enable response logging ( #28049 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-11-06 16:25:39 +00:00
StanHatko
e52e4da971
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores ( #27953 )
...
Signed-off-by: Stan Hatko <stan_hatko@live.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-11-06 23:47:11 +08:00
Milos Puzovic
2176778cd3
[Doc] Add Arm CPUs are on the list of supported targets in vLLM ( #26018 )
...
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com >
2025-11-06 15:30:26 +00:00
Eric Yue
0370679ce9
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 ( #28200 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-06 07:29:46 -08:00
Harry Mellor
8816e375d3
[Docs] Switch to directory style URLs ( #28058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-06 07:06:33 -08:00
Michael Goin
f32229293e
Disable nm-testing models with issues in CI ( #28206 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-06 06:19:07 -08:00
xiangze-arm
c757a15f0f
[CPU]Improve cpu fused moe perf ( #27244 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-06 11:04:18 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-06 10:40:03 +00:00
courage17340
981cadb35c
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty ( #28181 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-11-06 17:52:13 +08:00
wangxiyuan
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-06 16:05:32 +08:00
Aditya Tewari
3755c14532
[CPU] Enable torch profiling ( #28130 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com >
2025-11-06 07:32:05 +00:00
Seungduk Kim
201dc98acc
Fix hard-coded parameter name in gemma3n.py ( #27946 )
...
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com >
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-05 23:07:36 -08:00
Julien Denize
a404e2c0f1
Patch Mistral Tokenizer ( #28146 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 06:43:16 +00:00
Xiaozhu Meng
e31946f86e
[flashinfer] fix FI all2all with FI cutlass moe ( #28166 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
2025-11-06 05:52:16 +00:00
gmagogsfm
bde5039325
[CI] Add compile/test_multimodal_compile.py to CI ( #28151 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 05:41:47 +00:00
Jacob Zhong
d72299d47b
Make the cv2 dependency optional ( #27780 )
...
Signed-off-by: Jacob <cmpute@qq.com >
2025-11-06 05:08:55 +00:00
Lukas Geiger
80679f108f
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data ( #28141 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-06 04:05:12 +00:00
Isotr0py
43ecd0a900
[Chore] Clean up deepseek v2/v3 config copy ( #28055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 03:46:30 +00:00
Chauncey
07d614511f
[Misc] Remove the duplicate code ( #28111 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 21:07:47 -05:00
Vadim Gimpelson
f948ab6945
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests ( #28170 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-06 01:22:13 +00:00
Wentao Ye
d71af5f502
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement ( #28164 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:21:08 -08:00
Wentao Ye
90189c71a9
[Bug] Fix env string "0" same to True ( #28159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:04:20 -08:00
Wentao Ye
d79d9f0780
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM ( #28157 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:03:09 -08:00
Vadim Gimpelson
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-05 17:01:12 -08:00
Dayeol Lee
1767658559
[Debugging] Add annotation for easier trace analysis ( #22496 )
2025-11-05 16:52:52 -08:00
Kuntai Du
efe73e9b57
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token ( #25431 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-06 00:12:00 +00:00
Zhewen Li
0b8e871e5e
[CI/Build] Fix test_defaults_with_usage_context in AMD CI ( #27926 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:24 -08:00
Zhewen Li
5ee93a5956
[CI/Build] Update checking logic in cutlass_group_gemm_supported ( #27948 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:10 -08:00
Snehlata
e15601789b
[Feature]: Add corrupted request metric to V1 metrics system. ( #27306 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-11-05 13:45:29 -08:00
Richard Zou
65ac8d8dc4
[Docs] Add guide to debugging vLLM-torch.compile integration ( #28094 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-11-05 21:31:46 +00:00
Isotr0py
ffb08379d8
[Chore] Remove Nemotron-Nano-VL config copy ( #28126 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 20:06:45 +00:00
R3hankhan
e04492449e
[Hardware][IBM Z] Optimize s390x Dockerfile ( #28023 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-11-05 11:25:44 -08:00
Michael Yao
518ec6b722
[Docs] Clean up README_TUNING.md ( #28088 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-05 19:01:34 +00:00
wang.yuqi
802748bddb
[Bugfix] Fix Qwen3-Reranker-8B load ( #28117 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-05 18:33:50 +00:00
Paul Zhang
faedbb4d4f
[Feature] Extend batch invariant torch.compile to B200 ( #27856 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
2025-11-05 10:04:49 -08:00
Samuel Shen
40db194446
[CI]: Add LMCacheConnector Unit Tests ( #27852 )
...
Signed-off-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu >
2025-11-05 09:45:57 -08:00
Chen Zhang
c765f0b443
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell ( #27994 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 09:25:32 -08:00
gmagogsfm
002b07c4b2
[Bugfix] vLLM should check Inductor config for compile cache enablement status ( #27637 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-05 12:22:44 -05:00
Walter Beller-Morales
752ddeacaa
[Core] add support for reasoning parser plugins ( #28075 )
...
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com >
2025-11-06 01:15:06 +08:00
Jiangyun Zhu
c18f88c6ca
[Kernel] Fuse computation of g and beta for Gated Delta Net ( #28095 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-05 09:14:55 -08:00
Jiaju Zhang
6fd0df8132
[misc] add vLLM Beijing Meetup ( #28127 )
...
Signed-off-by: Jiaju Zhang <jjzhang@redhat.com >
2025-11-05 17:12:59 +00:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 16:53:33 +00:00
Pleaplusone
6cae1e5332
[ROCm][MLA] Support block-size > 1 for AITER MLA backend ( #27224 )
...
Signed-off-by: ganyi <ygan@amd.com >
Co-authored-by: wuhuikx <hattie.wu@amd.com >
2025-11-05 10:43:02 -05:00
Alexei-V-Ivanov-AMD
80c9275348
Enabling cooperative multi-gpu tests on multi-gpu nodes ( #27986 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-05 10:35:49 -05:00
Ilya Markov
e50c454672
[BugFix] Support EP/DP + EPLB with MTP ( #25311 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-05 15:22:17 +00:00
Chen Zhang
5d16d0fa62
[DCP] check return_lse for all layers in dcp ( #27929 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 22:27:25 +08:00
bigmoyan
0606bea2b6
add kimi reasoning parser ( #28128 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
2025-11-05 21:48:33 +08:00
Frost Mitchell
6e97eccf5d
[XPU] Enable custom routing functions in IPEX for Llama4 ( #28004 )
...
Signed-off-by: frost-intel <frost.mitchell@intel.com >
2025-11-05 13:39:57 +00:00
Boyuan Feng
6ab183813c
[Graph Partition][Cache] Use inductor partition ops config ( #27702 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-05 13:04:48 +00:00
amirkl94
6b7a81185d
Bugfix: Cutlass FP8 FusedMoE bad scaling factors ( #27255 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-05 06:06:06 -05:00
Eric Yue
b57789b62b
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message ( #27635 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-05 19:03:51 +08:00
Chauncey
377061d481
[Misc] fix import error for DeepSeekR1ReasoningParser ( #28114 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 19:02:32 +08:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-05 10:36:31 +00:00
Qiu
16b37f3119
[bugfix] fix wrong dcp_local_seq_lens calc ( #27518 )
...
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
2025-11-05 17:58:13 +08:00
Chauncey
0976711f3b
[Refactor] to simplify and extract the shared logic between chat completion and responses ( #27961 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:46:39 +08:00
Chauncey
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:37:02 +08:00
Alex Brooks
b7cbc25416
[Model, Core] Support Granite Speech & LoRA for STT ( #24455 )
2025-11-05 08:33:48 +01:00
Lucas Wilkinson
d43ad5a757
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) ( #28100 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-05 14:54:43 +08:00
Isotr0py
0ff05e3770
[Bugfix] Fix encoder-only model support for transformers backend ( #28021 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 22:24:41 -08:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-04 20:51:16 -08:00
Zhewen Li
878fd5a16f
[CI/Build] Enable some fixed tests in AMD CI ( #28078 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 03:15:59 +00:00
Kunshang Ji
18b39828d9
[XPU] Add gpt-oss model support for Intel GPU ( #27786 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-05 02:17:23 +00:00
tou
4ea62b77f5
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 ( #27740 )
2025-11-05 09:25:09 +08:00
Vadim Gimpelson
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 15:58:23 -08:00
Aleksandr Malyshev
2d977a7a9e
[ROCm] gemm_a16w16 upstreaming ( #26969 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-04 16:01:00 -05:00
Chenheli Hua
1fb4217a05
[Multimodal] Make MediaConnector extensible. ( #27759 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-11-04 18:28:01 +00:00
nadavkluger
611c86ea3c
Added disable rule to track files under benchmarks/lib ( #28048 )
...
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai >
2025-11-04 18:18:43 +00:00
Pleaplusone
dc937175d4
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation ( #25763 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-04 18:05:33 +00:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 18:01:56 +00:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 17:06:28 +00:00
Nick Hill
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 08:33:55 -08:00
yt0428
05cae69f0f
[model] Add support for openPangu_Ultra_MoE ( #27521 )
...
Signed-off-by: yuantao <2422264527@qq.com >
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 08:17:20 -08:00
Vadim Gimpelson
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 08:11:41 -08:00
lyrisz
97e3dda84b
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM ( #27284 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com >
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-04 07:49:25 -08:00
Nick Hill
5a0a6dfd55
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size ( #28025 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 07:38:16 -08:00
bnellnm
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
tomeras91
e4ee658672
[Model] add optimal triton fused moe configs for NemotronH MoE ( #27967 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:59:43 +00:00
tomeras91
77f8001f53
[Model][Bugfix] fix pipeline parallelism support for NemotronH ( #27968 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:28:36 +00:00
Zhuohan Li
300a265978
[Core] Enable StatLogger in LLMEngine ( #28020 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-04 04:13:35 -08:00
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-11-04 06:00:57 -05:00
yugong333
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately ( #27435 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
Zhewen Li
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI ( #28022 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 07:20:50 +00:00
CSWYF3634076
43a6acfb7d
[Model] fix ernie45 reasoning_parser ( #27973 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-11-04 07:16:46 +00:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 23:00:49 -08:00
Zhewen Li
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI ( #27944 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 06:36:40 +00:00
xiangze-arm
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance ( #27240 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-04 06:33:23 +00:00
Wentao Ye
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) ( #27884 )
2025-11-04 14:05:55 +08:00
Mark McLoughlin
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode ( #27867 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 20:35:36 -08:00
liuzhenwei
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 ( #27849 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-04 03:28:35 +00:00
Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-04 10:10:10 +08:00
li2haipeng
6ddae74054
[LoRA] Lora shrink swizzle ( #27694 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 09:30:20 +08:00
vllmellm
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm ( #27748 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-03 17:12:19 -08:00
QiliangCui
7956b0c0bc
Remove the tpu docker image nightly build. ( #27997 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-04 00:35:54 +00:00
Tyler Michael Smith
3758757377
[Bugfix] Fix MoE Routing Simulation ( #28002 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-03 22:26:49 +00:00
Hank_
ccd3e55e51
[Bugfix][plugin] fla crash on plugin ( #27322 )
2025-11-04 05:27:03 +08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 13:04:40 -08:00
Ning Xie
786030721e
[Docs] add runai_streamer_sharded to LoadConfig ( #27937 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-03 20:35:16 +00:00
Matthew Bonanni
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold ( #27777 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 15:17:10 -05:00
Lucas Kabela
55011aef24
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile ( #27764 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-03 11:12:15 -08:00
Sophie du Couédic
a4398fbb5e
[Feature][Benchmarks] Support inf burstiness ( #26941 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
2025-11-03 18:33:17 +00:00
Aurick Qiao
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference ( #25784 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
2025-11-03 09:23:31 -08:00
Lucas Wilkinson
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test ( #27235 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-03 17:00:46 +00:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile ( #27616 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-11-03 11:13:51 -05:00
pwschuurman
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection ( #27846 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-11-03 14:31:03 +00:00
gnovack
294c805f1d
Early exit for MoE LoRA kernels ( #27131 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 20:22:17 +08:00
zhang-prog
40b69e33e7
[Model] Add PaddleOCR-VL Model Support ( #27758 )
...
Signed-off-by: zhangyue <zhangyue66@baidu.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-03 19:04:22 +08:00
Jee Jee Li
32257297dd
[CI/Build] Remove the flaky gpt-oss lora test ( #27966 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 16:50:06 +08:00
Misha Efimov
ba464e6ae2
Add ORCA endpoint load metrics support ( #24905 )
...
Signed-off-by: Misha Efimov <mef@google.com >
2025-11-03 08:21:31 +00:00
Kunshang Ji
7f4bdadb92
[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue ( #27964 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-03 07:36:59 +00:00
Rémi Delacourt
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill ( #26263 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-03 02:22:46 -05:00
Thomas Parnell
18961c5ea6
[Hybrid] Pass kernel block size to builders ( #27753 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-03 05:48:03 +00:00
Sungyoon Jeong
470ad118b6
[Frontend] Align finish_reason when tool is called with OpenAI ( #25054 )
...
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-03 04:21:18 +00:00
Biswa Panda
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request ( #27728 )
...
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
2025-11-03 10:08:08 +08:00
Vensen
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 ( #27420 )
...
Signed-off-by: vensenmu <vensenmu@gmail.com >
2025-11-02 16:24:01 +00:00
Cyrus Leung
6c317a656e
[Misc] Provide Siglip2 chat template ( #27939 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching ( #26377 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-11-02 04:16:23 -08:00
Julien Denize
73444b7b56
Performance fix MistralTokenizer: cache special ids and tokens ( #27925 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-11-02 08:48:33 +00:00
Cyrus Leung
853a8eb53b
[Bugfix] Fix Qwen Omni audio inference ( #27920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 05:06:05 +00:00
Ben Browning
758ea2e980
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma ( #27924 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-11-02 03:45:02 +00:00
Yue Zhang
685c99ee77
[KV offload] Offloading connector async scheduling support ( #27648 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-01 21:08:56 +00:00
Benjamin Bartels
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server ( #27882 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
2025-11-01 12:45:42 -07:00
Nick Hill
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling ( #27910 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 10:51:24 -07:00
wenxindongwork
af6e19f50f
[Core][TPU] Support TPU Data Parallalism ( #27365 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com >
2025-11-01 17:14:44 +00:00
Cyrus Leung
99d69af9ec
[Bugfix] Python 3.10 compatibility for Self ( #27918 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-01 15:28:54 +00:00
Haco
d811b442d3
[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues ( #26779 )
...
Signed-off-by: xiaohajiayou <923390377@qq.com >
2025-11-01 10:52:43 -04:00
wangxiyuan
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module ( #27798 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:17:45 +00:00
Harry Mellor
799ce45cc1
[Docs] Mock all imports for docs ( #27873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:02:23 +00:00
ai-jz
2c0c7c39bd
feat(benchmarks): support HF model names in multi-turn benchmark ( #27850 )
2025-11-01 08:04:52 +00:00
Yihua Cheng
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 07:17:07 +00:00
TJian
e2347dbf58
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration ( #27895 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-01 13:45:23 +08:00
Cyrus Leung
879a06579e
[CI/Build] Bump transformers version ( #27528 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 22:11:07 -07:00
yugong333
29de3cdee4
Adding SplitK in fused_moe_lora kernel ( #27818 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 12:55:46 +08:00
Yan Ma
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Yejing Lai <yejing.lai@intel.com >
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-01 04:45:02 +00:00
Jee Jee Li
3a5de7d2d6
[Bugfix] Fix KDA output ( #27905 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 11:54:36 +08:00
Jee Jee Li
bc4486d609
[Kernel] Enable FusedMoEModularKernel support bias ( #27754 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 02:05:12 +00:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 00:35:04 +00:00
Chen Zhang
df334868ca
[Hybrid] A simpler algorithm to find kernel_block_size ( #26476 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-31 21:30:28 +00:00
Bram Wasti
0e0a638c3b
Batch invariance doc ( #27839 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-31 17:22:19 -04:00
Matthew Bonanni
f29aeb5a25
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run ( #27663 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-31 11:12:19 -07:00
Vinay R Damodaran
5e8862e9e0
[Feature] Pydantic validation for scheduler.py and structured_outputs.py ( #26519 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 18:05:50 +00:00
Nick Hill
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-31 10:57:45 -07:00
Shu Wang
fc16f1c477
Flashinfer_CUTLASS_MOE fuses quantization for TP ( #27223 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
2025-10-31 17:54:29 +00:00
ZiTian Zhao
bc306fe5e9
fix incorrect type annotation in KimiMLP ( #27885 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-31 17:38:02 +00:00
Chenguang Zheng
103a468bbf
[bugfix] Missing cached item in beam search ( #27874 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-31 17:34:27 +00:00
Rob Mulla
70bfbd7b16
Docs update tpu install instructions ( #27824 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com >
Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 10:29:55 -07:00
GuanLuo
d6517be3cd
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node ( #26338 )
...
Signed-off-by: Guan Luo <gluo@nvidia.com >
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com >
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-31 10:16:00 -07:00
Isotr0py
7e06c40e63
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V ( #27860 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 17:04:51 +00:00
Madeesh Kannan
675704ac01
[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation ( #27876 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
2025-10-31 16:58:42 +00:00
Jee Jee Li
0384aa7150
[CI/Build] Add gpt-oss LoRA test ( #27870 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-31 22:17:21 +08:00
Jiangyun Zhu
3857eb8725
[Perf] Decouple torch op from GDA to leverage torch.compile ( #27871 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-31 21:35:52 +08:00
Huamin Li
933cdea440
[BugFix] Don’t compute reorder threshold when there are no attention groups ( #27861 )
2025-10-31 11:36:18 +00:00
Isotr0py
3933f18a5e
[Bugfix] Avoid too small block m/n for FlexAttention kernel option ( #27853 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 19:33:12 +08:00
toncao
e5ef4dfc11
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants ( #27834 )
...
Signed-off-by: toncao <cpatonn@gmail.com >
Co-authored-by: toncao <cpatonn@gmail.com >
2025-10-31 17:36:37 +08:00
Akash kaothalkar
36960501d3
[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power ( #27734 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-31 07:45:26 +00:00
Seiji Eicher
b2e65cb4a7
[benchmark] Make request IDs unique across clients by default ( #27723 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-10-30 17:40:35 -07:00
Wentao Ye
2bf0bcc1fc
[CI Test] Add Scheduled Integration Test ( #27765 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 17:29:26 -07:00
Jakub Sochacki
697f507a8e
[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 ( #26919 )
...
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl >
2025-10-31 07:57:22 +08:00
Matthew Bonanni
d5d2a0fe74
[Misc] Make all tool scripts executable ( #27831 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-30 23:46:02 +00:00
Nick Hill
c9791f1813
[BugFix] Fix broken import in initialize_ray_cluster() ( #27838 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-30 16:26:13 -07:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 13:11:29 -07:00
Jialin Ouyang
4b68c4a55b
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty ( #27799 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 19:47:30 +00:00
Wentao Ye
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 15:32:39 -04:00
Sumanth R Hegde
4917002523
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode ( #27789 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2025-10-30 19:26:27 +00:00
cong-meta
a2981c4272
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests ( #24945 )
...
Co-authored-by: Cong Chen <prowindy@gmail.com >
2025-10-30 12:10:16 -07:00
Jialin Ouyang
4574d48bab
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index ( #27629 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 11:52:36 -07:00
Tyler Michael Smith
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-30 11:52:18 -07:00
Roger Meier
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
Mengqing Cao
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-30 17:27:39 +00:00
Huy Do
ba33e8830d
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27768 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-30 10:22:30 -07:00
Kebe
33a0ea5f32
[Docs] add Shanghai Meetup - 2025/10 ( #27545 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com >
2025-10-31 00:33:13 +08:00
Ilya Markov
60f76baa66
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices ( #27564 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-30 11:41:44 -04:00
Varun Sundar Rabindranath
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-30 08:24:31 -07:00
Li, Jiang
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-30 15:12:05 +00:00
Fan Yin
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-30 22:34:41 +08:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-30 21:02:27 +08:00
Huamin Li
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 12:27:53 +00:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
Sairam Pillai
74374386e2
[Bugfix] Improve GPU validation logging in Ray fallback scenarios ( #25775 )
...
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com >
2025-10-30 11:57:59 +00:00
Wentao Ye
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 11:32:17 +00:00
Huamin Li
c7d2a554ba
[CI Failure] fix test_default_mm_loras ( #27795 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 18:13:03 +08:00
wangxiyuan
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module ( #27784 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-30 09:42:49 +00:00
Zhewen Li
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-30 07:54:44 +00:00
Huamin Li
5be1bed790
[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 ( #27113 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 07:50:56 +00:00
yitingdc
31b55ffc62
use stringData in secret yaml to store huggingface token ( #25685 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-10-30 00:47:36 -07:00
Bram Wasti
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
Kuntai Du
8bff831f0a
[Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark ( #25786 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-10-30 04:43:37 +00:00
Lucas Wilkinson
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-29 21:39:34 -07:00
Fardin Hoque
b8c48c5d72
kernels/moe test pruning ( #27053 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 12:10:34 +08:00
Benjamin Bartels
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: omer-dayan <omdayan@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-29 21:09:10 -07:00
Nick Hill
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 21:04:25 -07:00
Kunshang Ji
b5bae42f91
[XPU] Update latest IPEX 2.8 release ( #27735 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-10-30 11:17:13 +08:00
Chen Zhang
d7fb10c574
[Bugfix] mamba-block-size is set for vision language model ( #27773 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-29 19:39:57 -07:00
Yan Ma
b798e39f93
[XPU][bugfix] fix rope for llama4 and deepseek ( #25145 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-10-30 09:43:13 +08:00
Chenheli Hua
48eb8eba58
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. ( #27760 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 23:17:48 +00:00
Wentao Ye
b5d90f7400
[Bug] Fix DBO IMA issue for DeepEPHT ( #27666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 16:28:27 -04:00
Nick Hill
d4aa144343
[BugFix] Fix handling of resumed reqs in SharedStorageConnector ( #27719 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 20:16:52 +00:00
Wentao Ye
fcb1d570bb
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug ( #27682 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 14:50:39 -04:00
Nicolò Lucchesi
accb8fab07
[KVConnector] Add metrics to Prometheus-Grafana dashboard ( #26811 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-10-29 18:44:49 +00:00
Wentao Ye
5b0448104f
[Bug] Raise error explicitly if using incompatible backend ( #27424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 13:29:20 -04:00
22quinn
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-29 10:26:06 -07:00
Boyuan Feng
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-29 17:08:54 +00:00
JartX
7568a282b9
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA ( #27744 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-29 16:55:35 +00:00
Braulio Dumba
1da3309ace
[Core] Exposing engine sleep & wake_up state as prometheus metrics ( #24176 )
...
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com >
2025-10-29 09:32:01 -07:00
Wentao Ye
5522fb274b
[Chore] Optimize P2PNCCLEngine http_address ( #27488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 00:05:09 +08:00
Nicolò Lucchesi
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-29 15:10:35 +00:00
Xiake Sun
ded24e3e54
[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP ( #27623 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-10-29 14:44:03 +00:00
Roger Young
d6704dd099
Fix MiniMax-M2 rmsnorm precision and remove useless code ( #27627 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-29 21:01:05 +08:00
Cyrus Leung
ecca3fee76
[Frontend] Add vllm bench sweep to CLI ( #27639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-29 05:59:48 -07:00
Zhewen Li
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
Isotr0py
ad3ec89532
[VLM] Add Qwen3-VL generation test ( #25185 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 12:19:37 +00:00
Kevin H. Luu
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-10-29 10:29:49 +00:00
Eugene Khvedchenya
5e72216d17
Feature/video support in random mm dataset ( #25963 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 18:24:52 +08:00
Isotr0py
1a33aacf82
[Misc] Raise error for missing video metadata in MultiModalDataParser ( #27664 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-29 10:06:42 +00:00
Yue Zhang
7ba6aa8f56
[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration ( #27670 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
2025-10-29 10:03:54 +00:00
Alec S
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug ( #27689 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 10:01:32 +00:00
Alec S
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry ( #27675 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 09:42:44 +00:00
bnellnm
1891cf605a
[Bugfix] Fix modular kernel tests ( #27707 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-29 16:14:33 +08:00
Jiangyun Zhu
8df98c2161
[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next ( #27578 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-29 08:12:54 +00:00
Cyrus Leung
4fb8771cc0
[CI/Build] Move pre-commit only scripts to tools/pre_commit ( #27657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-29 08:04:33 +00:00
Dipika Sikka
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-10-29 00:54:21 -07:00
Zhewen Li
8b62495076
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl ( #27605 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 00:00:15 -07:00
Zhewen Li
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD ( #27712 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 06:27:30 +00:00
Shaoting
a4a4f0f617
[KV Connector] Update lmcache connector with latest compatibility ( #27681 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-29 05:38:37 +00:00
Lukas Geiger
0d8161b075
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes ( #27705 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 05:28:20 +00:00
liuzhenwei
d2c33c397a
[NIXL][XPU] update name of nixl wheel ( #27631 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-29 12:43:29 +08:00
Varun Sundar Rabindranath
f6d5f5888c
[Build] Revert triton_kernels requirements ( #27659 )
2025-10-28 21:07:09 -07:00
Simon Mo
9007bf57e6
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27714 )
2025-10-28 20:58:01 -07:00
Huy Do
f257544709
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 ( #27598 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 19:39:15 -07:00
Jialin Ouyang
0b51c9bd8b
[Core] Early return in SlidingWindowManager.remove_skipped_blocks ( #27673 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-29 01:32:33 +00:00
Wentao Ye
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default ( #27677 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 23:53:12 +00:00
Lucas Kabela
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com >
2025-10-28 22:36:43 +00:00
Nick Hill
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs ( #27615 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 22:35:54 +00:00
Or Ozeri
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
Wentao Ye
6afc28a9ba
[Test] Batch Invariant: Unit test using parameterized backend ( #27478 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 13:51:35 -07:00
Lucas Wilkinson
141e6a0505
[Misc] Make reorder batch also separate extends ( #27367 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-28 10:55:10 -07:00
Matvei Pashkovskii
130aa8cbcf
Add load pattern configuration guide to benchmarks ( #26886 )
...
Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com >
Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-28 10:49:15 -07:00
Zhengxu Chen
e3d8186666
[compile] Add fallback path to AOT compile when serialization fails. ( #27350 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:54:26 -04:00
Cyrus Leung
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum ( #27658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 16:23:35 +00:00
Mohammad Miadh Angkad
a8c02fb5bf
[Bugfix][CI] Fix v1 attention backend tests and add CI coverage ( #26597 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-28 11:42:05 -04:00
Kero Liang
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: donglu <donglu@cohere.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 15:01:24 +00:00
Zhiyuan Li
e88bdd60d9
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM ( #27654 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
2025-10-28 22:56:28 +08:00
Samuel Shen
05e034f085
[nit]: Fix import for the lmcache integration ( #27600 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-28 14:40:55 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache ( #27294 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-10-28 10:22:28 -04:00
Junpu Fan
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request ( #27596 )
...
Signed-off-by: Junpu Fan <junpufan@gmail.com >
2025-10-28 14:02:43 +00:00
22quinn
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-28 13:52:47 +00:00
wangln19
446912d1cb
fix: allow HuggingFace standard chat template params via **kwargs ( #27622 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-28 21:12:34 +08:00
Zhengxu Chen
a00d6254e9
[compile] Disable dynamo guards check for AOT compilation. ( #27288 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:58:12 +00:00
Asaf Joseph Gardin
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-28 12:54:24 +00:00
Zhengxu Chen
259504e147
[compile] Add enable_prompt_embeds to compile hash. ( #27285 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:46:03 +08:00
Wentao Ye
0484b64248
[Bug] Fix shape issue for eplb expert weights ( #27589 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:44:05 +08:00
Cyrus Leung
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 12:20:46 +00:00
Matthew Bonanni
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len ( #27431 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-28 12:00:56 +00:00
Nick Hill
7a865f2325
[V0 Deprecation] Remove vestigial V0 logits_processors.py file ( #27601 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 19:17:45 +08:00
wangln19
2fa90bda27
Fix a robust parsing issue in KimiK2ToolParser that causes IndexError ( #27565 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
2025-10-28 11:11:50 +00:00
Zhewen Li
0291fbf65c
[CI/Build] Fix amd model executor test ( #27612 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-28 08:58:11 +00:00
Jialin Ouyang
b46e4a06f1
[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor ( #27618 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-28 08:13:10 +00:00
Li, Jiang
d34f5fe939
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms ( #27526 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-27 23:25:44 -07:00
Eric Yue
bdb01a38fe
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X ( #27323 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-10-27 22:58:06 -07:00
vllmellm
5b3c35a68e
[ROCm] [Doc] Update ROCm installation docs ( #27327 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-28 13:00:50 +08:00
Chauncey
61fbfe5274
[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines ( #27555 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-28 02:18:08 +00:00
Kuntai Du
255e34ca50
[Stability fix] turn off HMA allocator when connector is set ( #27592 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-27 18:32:23 -07:00
Roger Wang
a8d2e326ec
[Bugfix][CI] Fix config resolving logic with remote models ( #27610 )
2025-10-28 00:48:32 +00:00
Andrew Xia
53a56e658b
[gpt-oss][2/N] Support input_messages in responsesRequest ( #26962 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-27 23:15:49 +00:00
usberkeley
69f064062b
Code quality improvements: version update, type annotation enhancement, and enum usage simplification ( #27581 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-27 17:50:22 +00:00
Micah Williamson
921e78f4bb
[ROCm] Update AITER branch for ROCm base docker ( #27586 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-27 17:22:33 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 15:30:38 +00:00
Ben Browning
3b96f85c36
[Chore]: Stream tokens vs characters in tool call parser tests ( #26513 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-27 23:06:25 +08:00
tingtinggithub
23ad820553
fixing mm placeholder replacement issue with gemma3 ( #27538 )
...
Signed-off-by: tingtingtang1992 <streamttt@gmail.com >
2025-10-27 14:34:01 +00:00
Varun Sundar Rabindranath
5d3be3ba4c
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement ( #27487 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-27 07:32:50 -07:00
Yu Jiaqi
4f882be4a0
[Model] Siglip2 Model Support ( #27566 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-27 06:57:37 -07:00
Asaf Joseph Gardin
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol ( #27339 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-27 13:05:20 +00:00
Jee Jee Li
f4e8154076
[Kernel] Enable moe LoRA kernel support FP16 ( #27468 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 19:48:37 +08:00
Fadi Arafeh
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-27 11:14:55 +00:00
Chauncey
a4fc21895e
[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. ( #27561 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-27 11:06:43 +00:00
Shanshan Shen
a3e8611da5
[Bugfix] Limit the default value of max_model_len when it is not specified by users ( #27556 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-10-27 10:16:20 +00:00
Cyrus Leung
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 09:05:40 +00:00
Danielle Robinson
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
Jee Jee Li
2d631d28c6
[Doc] Slight improvement to M2 and beyond ( #27554 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-27 09:02:10 +00:00
Cyrus Leung
b368382964
[Model] Deprecate merge_by_field_config=False ( #27551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 16:43:00 +08:00
gnovack
a806c14cc7
[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora ( #27445 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-10-27 06:31:55 +00:00
yyzxw
181bf5bbde
[Docs] reemove the incorrect enable_reasoning parameter ( #27550 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-10-26 23:17:19 -07:00
Cyrus Leung
cbd5e07a51
[Model] Use merge_by_field_config for MM models (Qwen series) ( #27546 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 05:38:05 +00:00
CSWYF3634076
63b22e0dbb
[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple ( #27316 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-26 20:53:31 -07:00
Roger Young
5980604c44
Fix MiniMax-M2 copyright ( #27537 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 03:29:51 +00:00
youkaichao
361a7463d3
fix m2 test ( #27536 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-27 01:04:36 +08:00
Roger Young
720af6ab79
[Model][MiniMax-M2] Support MiniMax-M2 Model ( #27535 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 00:59:11 +08:00
Cyrus Leung
55cba4a05c
[CI/Build] Update causal-conv1d installation ( #27529 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 22:14:22 +08:00
Cyrus Leung
c7abff2990
Revert "[CI/Build] Use CPU for mm processing test on CI ( #27522 )" ( #27531 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 04:44:27 -07:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
Cyrus Leung
8fb7b2fab9
[Doc] Fix links to GH projects ( #27530 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 17:55:51 +08:00
Cyrus Leung
be7b55a83d
[Doc] Remove Molmo warning ( #27527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 16:22:52 +08:00
Lucia Fang
315b860abe
[bugfix]fix empty prompts for async-engine mode in benchmark throughput ( #27494 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-26 08:16:35 +00:00
rongfu.leng
87c41c26ad
[Bugfix] Fix processor initialization for model from modelscope instead of HF ( #27461 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 07:44:31 +00:00
JartX
65d2cf9511
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA ( #27190 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-26 15:08:52 +08:00
Isotr0py
d63cd9ff10
[CI/Build] Use CPU for mm processing test on CI ( #27522 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 13:09:18 +08:00
Cyrus Leung
66a168a197
[CI/Build] Refactor processing tests ( #27470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-25 16:14:30 +00:00
Matthew Bonanni
a99564ac5b
[Attention] Add missing kv cache scale setup ( #27490 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-25 00:12:49 -07:00
Cyrus Leung
4c5f632165
[Misc] Simplify max tokens in multimodal registry ( #27500 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 23:56:01 -07:00
Kuntai Du
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
Zhuohan Li
56ed7609a9
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… ( #27502 )
2025-10-25 05:31:43 +00:00
Jiangyun Zhu
29c9cb8007
[CI] Add tests for cudagraph ( #27391 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-25 02:37:33 +00:00
Yihua Cheng
83f478bb19
[KVConnector] Migrate the LMCache integration code to be vLLM native ( #25542 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-10-25 00:23:53 +00:00
Varun Sundar Rabindranath
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-24 23:29:24 +00:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-24 19:27:04 -04:00
Pengchao Wang
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-10-24 14:16:44 -07:00
Lehua Ding
0402428200
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run ( #27455 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-10-24 20:45:36 +00:00
jinghanhu
17af6aa0da
[Document] Add ms-swift library to rlhf.md ( #27469 )
2025-10-24 20:31:50 +00:00
Zhewen Li
fc168c33f3
[CI/Build] Fix test_torch_utils in AMD CI ( #27317 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-24 12:26:00 -07:00
Isotr0py
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-24 17:43:45 +00:00
Ming Yang
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek ( #26397 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-24 10:24:08 -07:00
kourosh hakhamaneshi
7e1d697b56
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries ( #27366 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-24 17:08:05 +00:00
Chendi.Xue
699d62e6cf
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished ( #27297 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-24 17:01:41 +00:00
Richard Zou
cd390b609d
[compile] Turn standalone_compile back on ( #27460 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-24 16:30:27 +00:00
Fadi Arafeh
2080b05099
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype ( #27472 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-24 15:57:48 +00:00
Lifans
6454afec90
[Doc] Fix minor issues in docs/design/metrics.md ( #27436 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-10-24 05:40:54 -07:00
Chauncey
41a62564a7
Fix test named tool use ( #27458 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-24 20:27:45 +08:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-24 05:11:05 -07:00
ioana ghiban
435be10db9
Fix AArch64 CPU Docker pipeline ( #27331 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-10-24 05:11:01 -07:00
Cyrus Leung
b7030d962b
[Benchmark] Enable benchmark to run with encoding_format="bytes" ( #27467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 11:16:50 +00:00
Chauncey
3567816932
[Refactor] move tool parsing logic from protocol.py to the tool parser ( #27383 )
...
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-10-24 09:53:23 +00:00
22quinn
e0ef8a2920
[BugFix] Fix torchrun DP with LLM class ( #27395 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-24 08:11:37 +00:00
Isotr0py
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-24 07:32:47 +00:00
Yu Jiaqi
88d3141ec6
[Docs] remove v1 column for embedding models ( #27446 )
...
Signed-off-by: piood <2477084691@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-23 23:55:03 -07:00
Rui Qiao
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-24 14:53:09 +08:00
strinczer
074475541a
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API ( #26706 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-23 22:53:42 -07:00
Aaron Pham
d4c574c39f
[Chore] remove structural tags logging lines ( #27451 )
2025-10-24 05:35:45 +00:00
usberkeley
c528b9006a
Fix EventPublisherFactory logic for disabled KV cache events ( #27419 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-24 05:00:01 +00:00
fhl2000
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder ( #27427 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
2025-10-23 20:31:14 -07:00
hfan
8dbe0c527f
[Misc] Add TPU usage report when using tpu_inference. ( #27423 )
...
Signed-off-by: Hongmin Fan <fanhongmin@google.com >
2025-10-23 20:29:37 -07:00
Xiangyu Li
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
Harry Mellor
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 20:23:55 -07:00
xiao-llm
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com >
2025-10-24 02:14:03 +00:00
Akash kaothalkar
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-23 21:21:36 +00:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
Kebe
51dd14ac2b
[Bugfix][DP] Fix creating too many DP Placement Groups ( #26880 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-23 20:16:51 +00:00
Matthew Bonanni
dbfbf9f324
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 ( #27368 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-23 15:58:15 -04:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
Varun Sundar Rabindranath
a9f55dc588
[Misc] Add triton_kernels dependency ( #27370 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-23 12:04:14 -07:00
Isotr0py
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-23 18:30:28 +00:00
Gregory Shtrasberg
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-23 17:43:53 +00:00
Alexander Matveev
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-24 00:03:14 +08:00
Alexei-V-Ivanov-AMD
295c7f0267
Mirroring the test definitions (2025-10-22) ( #27362 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-24 00:02:26 +08:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
Cyrus Leung
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 14:42:40 +00:00
Ilya Markov
237cf6d32a
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) ( #26709 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-23 20:58:39 +08:00
Navya Srivastava
faee3ccdc2
[Feature] Pydantic validation for speculative.py ( #27156 )
...
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 12:19:33 +00:00
Bradley D
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-23 20:09:52 +08:00
Harry Mellor
3a4255c7c4
Run mypy on the lowest supported Python version instead of system Python ( #27048 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 05:07:44 -07:00
tomeras91
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
Tova Movshovitz
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
Chauncey
d00ce29d89
[CI] Reorganize entrypoints tests ( #27403 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-23 10:10:06 +00:00
Louie Tsai
3b7bdf983b
add SLA information into comparison graph for vLLM Benchmark Suite ( #25525 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: louie-tsai <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-23 08:04:59 +00:00
Zhewen Li
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
Lucia Fang
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-23 07:34:06 +00:00
Cyrus Leung
bfb240cc49
[CI/Build] Fix Prithvi plugin test ( #27393 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 07:30:44 +00:00
Jonathan Chen
e255d92990
[Chore] Remove duplicate has_ functions in vllm.utils ( #27372 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 06:11:59 +00:00
wang.yuqi
3729ed00ba
[Model] Add num_cached_tokens for PoolingRequestOutput ( #27378 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-23 14:03:42 +08:00
Giancarlo Delfin
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-22 22:59:59 -07:00
Andrew Sansom
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-10-22 22:18:07 -07:00
PiteXChen
243ed7d32e
[Bugfix][Core] running queue index leakage exception ( #26754 )
...
Signed-off-by: CLFutureX <chenyongqyl@163.com >
2025-10-22 21:40:12 -07:00
fangpings
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
Cyrus Leung
6738e4a093
[Bugfix] Fix SLA tuner initialization ( #27355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 20:43:04 -07:00
Isotr0py
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 17:15:38 -07:00
Matthew Bonanni
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-22 15:48:37 -07:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 20:25:25 +00:00
Daisy-Ma-coder
5beacce2ea
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 ( #27128 )
...
Signed-off-by: qqma <qqma@amazon.com >
Co-authored-by: qqma <qqma@amazon.com >
2025-10-22 19:36:39 +00:00
rongfu.leng
8669c69afa
[Feature] publisher default set zmq in kv_event config ( #26915 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 19:19:33 +00:00
Sage
1651003c35
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing ( #27211 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2025-10-22 18:13:03 +00:00
William Song
1cb8c6c5fe
[Doc] Fix numbering sequence in prefix caching ( #27357 )
...
Signed-off-by: William Song <jinwook@umich.edu >
2025-10-22 17:35:47 +00:00
Luciano Martins
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2025-10-22 10:05:34 -07:00
Isotr0py
084a9dae80
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models ( #27344 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 16:39:08 +00:00
RED
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-22 09:13:18 -07:00
Nicolò Lucchesi
4dfdb821c8
[P/D] Dynamic kv_output_aggregator collect size ( #26734 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 18:07:58 +02:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
Isotr0py
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 08:39:23 -07:00
Cyrus Leung
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 08:38:34 -07:00
Chendi.Xue
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-10-22 15:28:13 +00:00
Jee Jee Li
9771e0b432
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA ( #27351 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 08:19:12 -07:00
Reinforce-II
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-22 08:16:09 -07:00
Wentao Ye
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 11:00:10 -04:00
Mark McLoughlin
4ca13a8667
[NIXL] Terminate handshake listener thread in shutdown ( #26404 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-22 16:59:53 +02:00
Isotr0py
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-22 07:59:15 -07:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 10:44:21 -04:00
Alexei-V-Ivanov-AMD
49c00fe304
Mirroring changes in test-pipeline.yaml into test-amd.yaml ( #27242 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-22 09:59:45 -04:00
Mark McLoughlin
141d3b9fc5
[docs] Update v1 metrics design doc ( #27332 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: atalhens <sneh.lata@nutanix.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: atalhens <sneh.lata@nutanix.com >
2025-10-22 06:29:15 -07:00
Jee Jee Li
abf3db40ef
[Core] Handle MoE LoRA edge cases ( #27335 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 13:14:33 +00:00
gnovack
8e4ca4d14e
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' ( #27311 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 12:23:57 +00:00
Wentao Ye
1a0f4defb7
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage ( #27282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 12:12:21 +00:00
Li, Jiang
843af7f7fc
[Bugfix][CPU] Disable dual stream execution for experts on CPU ( #27320 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-22 11:02:27 +00:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
ExtReMLapin
a4c29e6e82
fixed reasoning streaming with tool_choice="required" ( #24108 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-22 09:42:55 +00:00
Harry Mellor
8f18feb191
Remove last level references not removed in #26355 ( #27260 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-22 09:18:17 +00:00
Huy Do
ed540d6d4c
Update release pipeline for PyTorch 2.9.0 ( #27303 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-22 09:18:01 +00:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-22 09:04:57 +00:00
Jiangyun Zhu
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-22 00:22:39 -04:00
Cyrus Leung
ceacedc1f9
[Benchmark] Add plot utility for parameter sweep ( #27168 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-21 20:30:03 -07:00
Nicolò Lucchesi
bfa59be8f1
[CI] Nixl integration tests DP-EP ( #27199 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 11:17:48 +08:00
vllmellm
265ecb05fb
[DOC] [ROCm] Add ROCm quickstart guide ( #26505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-22 03:10:48 +00:00
Lain
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Lain <siyuanf@nvidia.com >
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 23:34:03 +00:00
Tyler Michael Smith
6c2eef5a5d
[P/D] KVConnector for decode benchmarking ( #25986 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-21 16:30:47 -07:00
Benjamin Chislett
19748806f0
[Bugfix] skip cuda graph for drafter when running with eager ( #26821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-21 15:39:09 -07:00
ExtReMLapin
4a8a567e16
Updated xgrammar backend to not deny supported string formats ( #27253 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-21 22:25:23 +00:00
Alexander Matveev
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-21 21:38:29 +00:00
Huy Do
becb7de40b
Update PyTorch to 2.9.0+cu129 ( #24994 )
...
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-21 17:20:18 -04:00
Tao He
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-21 18:27:03 +00:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 11:09:37 -07:00
David Whyte-Gray
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend ( #27196 )
...
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com >
2025-10-21 13:41:52 -04:00
Wentao Ye
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-21 10:25:55 -07:00
Micah Williamson
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile ( #27206 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-21 12:01:23 -04:00
Pavani Majety
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code ( #27213 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-21 22:59:53 +08:00
JartX
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-10-21 10:43:23 -04:00
Harry Mellor
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer ( #27262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-21 14:23:56 +00:00
dongbo910220
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-21 06:18:23 -07:00
Daniel Cámpora
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 08:30:07 +00:00
Roger Wang
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-21 00:30:10 -07:00
Nicolò Lucchesi
72f431e709
[Nixl] Minor refactor to handshake related metadata ( #26410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-21 09:07:47 +02:00
Zebing Lin
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-10-20 23:19:00 -07:00
Benjamin Chislett
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-20 22:51:44 -07:00
Varun Sundar Rabindranath
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:51:14 -04:00
Shu Wang
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:50:31 -04:00
Po-Han Huang (NVIDIA)
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-10-21 04:03:47 +00:00
Chen Wu
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
Russell Bryant
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template ( #27205 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 02:57:34 +00:00
Lunwen He
0eb8f2b880
create is_in_the_same_node on cpu ( #26832 )
...
Co-authored-by: Lunwen He <lunwenh@meta.com >
2025-10-21 02:04:14 +00:00
Fadi Arafeh
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Michael Yang <Michael.Yang@arm.com >
2025-10-21 02:02:58 +00:00
Nick Hill
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 02:02:10 +00:00
Isotr0py
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 01:49:28 +00:00
Andrew Xia
bfe0b4bd2a
[ez] add uv lock to gitignore ( #27212 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-21 00:37:44 +00:00
Concurrensee
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD ( #26725 )
...
Signed-off-by: Yida <yida.wu@amd.com >
2025-10-21 00:37:16 +00:00
Heng Guo
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
Nicolò Lucchesi
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test ( #27195 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-20 16:34:54 +00:00
shivampr
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
2025-10-20 07:48:01 -07:00
Eugene Khvedchenya
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 22:19:11 +08:00
ioana ghiban
1c691f4a71
AArch64 CPU Docker pipeline ( #26931 )
2025-10-20 07:09:40 -04:00
Jiangyun Zhu
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-20 05:39:02 +00:00
Andy Lo
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-20 04:21:09 +00:00
Yi Zhang
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com >
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 02:31:26 +00:00
Yongtao Huang
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role ( #27166 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-19 19:47:19 +00:00
Sergei Skvortsov
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled ( #26586 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
2025-10-19 19:24:46 +00:00
Cyrus Leung
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-19 05:20:55 -07:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
2025-10-19 03:06:32 -07:00
Jianyu Huang
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
Cyrus Leung
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-18 23:57:01 -07:00
dongbo910220
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils ( #27151 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-19 11:09:38 +08:00
22quinn
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core ( #27158 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-19 02:35:32 +00:00
Woosuk Kwon
fb860670da
[Minor] Remove unused env variable ( #27161 )
2025-10-18 18:48:35 -07:00
Tova Movshovitz
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations ( #22456 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-18 15:12:46 -07:00
Lucas Wilkinson
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
Boyuan Feng
e133d6d218
[BugFix] fix graph partition signature ( #27139 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-18 17:34:36 -04:00
dongbo910220
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils ( #27150 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 19:12:01 +00:00
Lucas Wilkinson
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-18 12:44:39 -06:00
Nick Hill
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-18 09:52:45 -07:00
Yongtao Huang
168e578efc
Fix incorrect string formatting in barrier timeout exceptions ( #27149 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-18 09:51:57 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
Lukas Geiger
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls ( #27106 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-18 07:05:05 -07:00
Nicolò Lucchesi
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase ( #26587 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-18 13:51:21 +00:00
Fadi Arafeh
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-18 13:30:21 +00:00
Wentao Ye
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell ( #27127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-18 09:28:05 -04:00
iAmir97
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
dongbo910220
83004020fd
[Test] Add test for /health endpoint on engine failure ( #26074 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 09:59:05 +00:00
Chendi.Xue
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 ( #27135 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-18 01:10:45 -07:00
Varun Sundar Rabindranath
30a33b92ee
[Misc] Rev DeepEP ( #27122 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-18 14:54:29 +08:00
Hanchenli
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com >
Co-authored-by: Wei Wei <weiweinpu@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-17 21:55:54 -07:00
Huamin Li
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests ( #26663 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-17 21:11:26 -07:00
ZiTian Zhao
c981f0ea78
[Perf] Add H100 fused MoE config ( #25398 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-18 02:21:27 +00:00
Lehua Ding
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: Lehua Ding <lehuading@qq.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-17 22:16:18 +00:00
Wentao Ye
f50cc221ea
[Test] Make test_failure more stable for batch invariance ( #27054 )
2025-10-17 16:59:08 -04:00
Pradyun92
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
2025-10-17 13:27:47 -07:00
Zhuohan Li
d29483b58a
[Minor] Remove unnecessary error message ( #27115 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-17 20:02:12 +00:00
Michael Goin
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 ( #27114 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-17 19:47:18 +00:00
Isotr0py
3125d79950
[Chore] Remove unused PolyNorm layer ( #27110 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-17 19:03:43 +00:00
vllmellm
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic ( #27029 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-17 12:51:10 -06:00
rasmith
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) ( #26192 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 14:17:18 -04:00
Aleksandr Malyshev
0925b28a8e
[ROCM] MoE fp4 CK kernel ( #26545 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-10-17 14:06:33 -04:00
Nicolò Lucchesi
99722d5f0e
[CI] Remove forbidden slash ( #27112 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 09:38:00 -07:00
燃
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True ( #27104 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-10-17 16:26:33 +00:00
Patrick von Platen
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-17 08:24:42 -07:00
Nicolò Lucchesi
2ba60ec7fe
[CI] Nixl integration tests ( #27010 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 07:13:31 -07:00
Luka Govedič
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled ( #24604 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 08:10:23 -06:00
Yongtao Huang
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-17 06:30:23 -07:00
Reima Karhila (AMD)
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 ( #25586 )
...
Signed-off-by: Reima Karhila <reima.karhila@amd.com >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-10-17 04:56:12 -07:00
Jee Jee Li
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping ( #27096 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:47:00 -07:00
Harry Mellor
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:47:34 -07:00
Harry Mellor
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
Mengqing Cao
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding ( #27088 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-17 02:00:30 -07:00
cong-meta
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage ( #27069 )
...
Signed-off-by: cong-meta <prowindy@hotmail.com >
2025-10-17 01:53:06 -07:00
Chauncey
acb1bfa601
[CI] fix docs build failed ( #27082 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-17 07:53:40 +00:00
zhrrr
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
Li, Jiang
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI ( #27068 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-16 22:34:56 -07:00
Said Taghadouini
3aeb19a39e
[Model] Add support for LightOnOCR ( #26916 )
...
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com >
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-17 05:05:24 +00:00
Cyrus Leung
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM ( #26715 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 05:03:35 +00:00
Zhewen Li
9c2c2287a0
[CI/Build] Update Llama4 eval yaml ( #27070 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-17 04:59:47 +00:00
Jee Jee Li
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
Jee Jee Li
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA ( #27065 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:43:16 +00:00
Nick Hill
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager ( #27060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-17 11:45:32 +08:00
Tao He
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 ( #27030 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-10-17 03:37:52 +00:00
Boyuan Feng
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
Nick Hill
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-16 20:08:03 -07:00
Harry Mellor
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
Tomas Ruiz
965c5f4914
vllm bench serve shows num of failed requests ( #26478 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2025-10-16 19:55:09 -07:00
Lukas Geiger
4d055ef465
Remove unused imports ( #26972 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 19:51:17 -07:00
Boyuan Feng
17c540a993
[torch.compile] fix simple inductor graph partition test ( #27050 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-16 21:09:36 -04:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 00:48:59 +00:00
Lucia Fang
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config ( #27041 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-17 00:01:52 +00:00
jiahanc
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel ( #26714 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 16:20:25 -07:00
Harry Mellor
fb5e10d3fb
Refactor Transformers backend to use mixins ( #26906 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 21:50:39 +00:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
Wentao Ye
23583ee28c
[Bug] Add Assertion for random-input-len / random-output-len ( #26834 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 21:36:39 +00:00
Michael Goin
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-16 17:26:35 -04:00
Wentao Ye
b3dda72c23
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout ( #26935 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 16:46:48 -04:00
Varun Sundar Rabindranath
fb0571b077
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels ( #25997 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-16 12:53:11 -07:00
Wentao Ye
2ed8b6b3d0
[Bug] Fix batch invariant test has to is ( #27032 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 19:45:14 +00:00
kimbochen
013abde6ef
Adding Warmup to Benchmark Serving ( #26943 )
...
Signed-off-by: Kimbo Chen <chentenghung@gmail.com >
2025-10-16 12:44:32 -07:00
Kyle Sayers
a5464dcf92
[Compressed Tensors] Always clone output for compile robustness ( #26849 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 19:29:59 +00:00
Mandy Li
ac3ed5a815
Support block size of 256 used by Intel HPU ( #26883 )
...
Signed-off-by: mandy-li <mandy.j.li@intel.com >
2025-10-16 15:10:57 -04:00
Andrew Xia
e6ba2000ae
[gpt-oss][1/N] EZ: refactor serving_responses for modularity ( #26948 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-10-16 18:44:06 +00:00
Harry Mellor
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 18:07:18 +00:00
ZiTian Zhao
7bb736d00e
Fix Qwen2.5 VL image grid docstring ( #27033 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-10-16 09:57:36 -07:00
Jee Jee Li
9f4e30904b
[Model] Fix Qwen3VL mm mapping ( #27027 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-16 09:45:59 -07:00
rongfu.leng
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-16 08:02:30 -07:00
Tahsin Tunan
43721bc67f
[CI] Replace large models with tiny alternatives in tests ( #24057 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 15:51:27 +01:00
Kay Yan
02d709a6f1
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) ( #27020 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-10-16 15:31:02 +01:00
Mark McLoughlin
4a510ab487
[NIXL] Improve request_finished() debug logs ( #25665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-16 15:55:17 +02:00
Matthew Bonanni
314fa8abbf
[Attention] Tune CUTLASS MLA num_splits ( #26846 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-16 06:36:09 -07:00
Cyrus Leung
334535b6fb
[Benchmark] Show E2EL by default for pooling models ( #27014 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 12:47:09 +00:00
bogdanm
dcbb3f1871
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py ( #27008 )
...
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 12:27:44 +00:00
Sungjae Lee
00417f4e44
[MISC] fix import violations for re and triton modules ( #26654 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-10-16 03:38:27 -07:00
Lukas Geiger
ed344f4116
Cleanup code after Python 3.10 upgrade ( #26520 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 03:38:23 -07:00
CSWYF3634076
e51928793e
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization ( #26885 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-16 03:37:35 -07:00
Cyrus Leung
d2740fafbf
[Chore] Separate out vllm.utils.collections ( #26990 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 08:35:35 +00:00
Cyrus Leung
17838e50ef
[Benchmark] Use truncation by default for pooling benchmarks ( #26992 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 16:02:39 +08:00
Zhewen Li
44c8555621
[CI/Build] Fix AMD import failures in CI ( #26841 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-16 07:28:20 +00:00
Akash kaothalkar
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-15 22:36:59 -07:00
Cyrus Leung
76f0d05bc6
[CI/Build] Update expected beam search output for Phi3V ( #26978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 05:12:44 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
Vadim Gimpelson
785d8b6410
[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) ( #26437 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-16 12:18:31 +08:00
Cyrus Leung
f6cdc9a02f
[Chore] Rename utils submodules ( #26920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 03:58:13 +00:00
Chendi.Xue
509cdc0370
[DOC][XPU]update feature parity with Intel GPU ( #26954 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 20:07:10 -07:00
Richard Zou
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue ( #26956 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-15 20:06:11 -07:00
Angela Yi
e19b16dde6
[bugfix] Fix SP + PP without specifying compile size ( #26955 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 20:05:33 -07:00
ahao-anyscale
582f2c6be7
[BUG] Allow runai_streamer_sharded in config check ( #26958 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-15 20:05:14 -07:00
Michael Goin
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 21:02:57 -06:00
kliuae
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops ( #24097 )
...
Signed-off-by: chenjun <junchen2@amd.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-10-16 10:41:34 +08:00
InChang Jeong
0ecc553ee6
[Bugfix] reasoning_parser parameter handling in run_batch.py ( #26225 )
...
Signed-off-by: inc-jeong <inc.jeong@navercorp.com >
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com >
Co-authored-by: USER <user@AL02367916.local >
2025-10-16 10:24:05 +08:00
felixzhu555
f96bc3649c
[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 ( #26887 )
...
Signed-off-by: Felix Zhu <felixzhu555@gmail.com >
2025-10-15 18:55:05 -07:00
Alexei-V-Ivanov-AMD
938c43ea7f
[ci] Adjusting AMD test composition 2025-10-14 ( #26852 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-15 23:52:13 +00:00
Adrian Abeyta
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 19:01:38 -04:00
Wentao Ye
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default ( #26925 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 16:18:50 -04:00
XiaobingZhang
0b99f5d302
support flashinfer_fp4 moe for 5090 gpu ( #26669 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 15:06:47 -04:00
Benji Beck
1f491aa0c8
Vectorize RMS norm variance using vectorize_read_with_alignment ( #26234 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 11:54:41 -07:00
Kaixi Hou
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer ( #26107 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-15 13:53:00 -04:00
Woosuk Kwon
a1063628a4
[Chore] Clean up CODEOWNERS ( #26923 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-15 10:52:54 -07:00
XiaobingZhang
d796375258
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint ( #26891 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
2025-10-15 13:06:17 -04:00
Sam/Samuel
14f8456344
[Feature]: Use pydantic validation in observability.py config ( #26637 )
...
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com >
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 16:44:03 +00:00
Pradeep Dasigi
4794c2bd92
Olmo 3 tool parser and tests ( #26143 )
...
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org >
2025-10-15 16:36:12 +00:00
Harry Mellor
d3cbaa08dc
Lower sevarity of log when model info cache misses due to exception ( #26917 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 09:01:09 -07:00
Cyrus Leung
828523ad8e
[Chore] Separate out vllm.utils.async_utils ( #26913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 15:33:00 +00:00
Cyrus Leung
136a17fe6e
[Chore] Separate out vllm.utils.func ( #26904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 13:03:58 +00:00
Boyuan Feng
f57438338d
[BugFix] Patch inductor memory plan logic ( #26878 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 12:51:45 +00:00
Max Wittig
5d598680e3
chore: remove unused marker ( #26890 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
2025-10-15 05:40:33 -07:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 12:11:48 +00:00
Cyrus Leung
f93e348010
[Misc] Remove isort and yapf ignores ( #26888 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 12:09:03 +00:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
li2haipeng
d4d1a6024f
[Lora]Load tuned multi-lora kernel configs from json files ( #26319 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-15 09:45:14 +00:00
wangxiyuan
db1764e4e0
[Platform] allow platform to init dp group ( #22243 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 02:32:17 -07:00
Jialin Ouyang
7f83b4ee8e
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager ( #26842 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 09:17:43 +00:00
ant-yy
5c3bae1a6a
[Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe ( #26876 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
2025-10-15 16:44:04 +08:00
Xudong Ma
5210dc3940
[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. ( #26853 )
...
Co-authored-by: Xudong Ma <mxd@meta.com >
2025-10-15 08:37:49 +00:00
youkaichao
650b51f9f9
[doc] add Context Parallel Deployment doc ( #26877 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-15 16:33:52 +08:00
Cyrus Leung
6256697997
[Doc] ruff format remaining Python examples ( #26795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 01:25:49 -07:00
Wentao Ye
71557a5f7c
[CI] Fix mypy for vllm/executor ( #26845 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 01:23:33 -07:00
Zhewen Li
f3c378ffa7
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI ( #21810 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
2025-10-15 08:09:56 +00:00
Yongye Zhu
f5ed68ef63
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather ( #26456 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-10-15 16:05:01 +08:00
Angela Yi
efdef57b1f
[bugfix] Lazy import cv2 ( #26869 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 07:47:50 +00:00
Cyrus Leung
b8a4572157
[Misc] Use helper function to generate dummy messages in OpenAI MM tests ( #26875 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 07:17:37 +00:00
Mengqing Cao
302ef403a2
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends ( #26656 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-15 00:16:44 -07:00
sangho.lee
8865da157b
[Bugfix][Multi Modal] Fix incorrect Molmo token processing ( #26873 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-15 07:13:59 +00:00
Boyuan Feng
f0862eae43
[Graph Partition] pass tests for decorator ( #26831 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-15 06:39:48 +00:00
Isotr0py
8c851f6d04
[Bugfix] Fix qwen3-omni audio truncation issue ( #26815 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-15 05:38:36 +00:00
Angela Yi
7cfa420f49
[BugFix] Patch inductor partitioning logic ( #26735 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 05:04:32 +00:00
rongfu.leng
a27b288e4a
[Feature] default --extra-body param to disable thinking in vllm bench serve ( #26784 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-15 04:23:44 +00:00
zhrrr
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 04:09:44 +00:00
Michael Yao
c43ca8259e
[Docs] Move build.inc into arm.inc ( #26862 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-14 20:35:08 -07:00
Tao Hui
85a65e7f51
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972 ) ( #25589 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-15 11:09:52 +08:00
kourosh hakhamaneshi
a2986b3e33
[Bugfix] Fixes prefix-repetition benchmark script ( #26828 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
2025-10-15 02:54:43 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 02:51:16 +00:00
Michael Goin
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 02:35:18 +00:00
Chendi.Xue
bfad142e25
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats ( #26851 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 02:33:25 +00:00
Zhikaiiii
9354660036
[Bugfix]fix Qwen3 xml tool parser ( #26345 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
2025-10-15 09:50:30 +08:00
Jialin Ouyang
07ca70af8d
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access ( #26810 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 01:41:18 +00:00
Luka Govedič
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-10-14 19:55:02 -04:00
Tyler Michael Smith
579d2e5458
[WideEP][P/D] Add usage stats for DP+EP and KV Connector ( #26836 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-14 23:51:54 +00:00
Ye Hu
0512c04aee
[frontend][gptoss] Add per turn stats into Harmony Context ( #25061 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Co-authored-by: Ye Hu <yehu@fb.com >
2025-10-14 16:48:13 -07:00
Michael Goin
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 16:41:43 -07:00
Nick Hill
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
Boyuan Feng
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
Nick Hill
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 14:46:37 -07:00
Nan Qin
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com >
2025-10-14 21:23:22 +00:00
Dhruvil Bhatt
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com >
2025-10-14 14:21:03 -07:00
Jialin Ouyang
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 17:03:21 -04:00
HDCharles
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
Jialin Ouyang
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 12:58:43 -07:00
Matthew Bonanni
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-14 19:38:20 +00:00
Huamin Li
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-14 11:54:12 -07:00