vllmellm
|
0af3d4f0df
|
[FEAT] [AITER] [ROCm] integrate aiter sampling ops (#26084)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-18 17:28:34 +00:00 |
|
Nick Hill
|
da8dadf68b
|
[Minor] Rename ec_producer field to is_ec_producer (#28884)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 17:26:07 +00:00 |
|
Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Luciano Martins
|
c2612371ad
|
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:56:29 -08:00 |
|
Ido Segev
|
49a986ecd4
|
[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-18 16:38:22 +00:00 |
|
Alex
|
f6aa122698
|
[CI Sprint] Quantization CI Cleanup (#24130)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-11-18 09:21:48 -05:00 |
|
Nicolò Lucchesi
|
184b12fdc6
|
[Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks (#28925)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-18 22:07:50 +08:00 |
|
Canlin Guo
|
b9489f51e1
|
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-18 11:51:54 +00:00 |
|
Song Zhixin
|
285eaa4285
|
[Bugfix] Safeguard against missing backend in AttentionBackendEnum (#28846)
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: Song Zhixin <szxfml@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 10:53:44 +00:00 |
|
Nick Hill
|
439368496d
|
[BugFix] Fix PP/async scheduling with pooling models (#28899)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
v0.11.1
|
2025-11-18 00:20:45 -08:00 |
|
Isotr0py
|
896e41ae04
|
[CI/Build] Replace wikipedia url with local server ones (#28908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:10:55 +00:00 |
|
Kuntai Du
|
5bb1da5190
|
[MISC] Remove format.sh (#28906)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-11-18 05:28:31 +00:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Ning Xie
|
0168f69e50
|
[Misc] Remove unnecessary parentheses from log statements (#28897)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-17 20:33:46 -08:00 |
|
Didier Durand
|
083cf326dc
|
[Doc]: fix typos in various files (#28863)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-17 20:32:14 -08:00 |
|
Cyrus Leung
|
bf9e1e8767
|
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields (#28872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-17 20:30:29 -08:00 |
|
Wentao Ye
|
3ddcf46011
|
[Refactor] Remove Unused Func in Batch Invariant (#28881)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 20:29:29 -08:00 |
|
xuebwang-amd
|
d0a73620cc
|
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 11:16:45 +08:00 |
|
Michael Goin
|
88ab591f0b
|
Run macos smoke test workflow on main commit (#28752)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-18 11:16:03 +08:00 |
|
Benjamin Bartels
|
b6e04390d3
|
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing (#28831)
Signed-off-by: Thomas Mao <yiyeguhu@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Thomas Mao <yiyeguhu@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-17 19:13:25 -08:00 |
|
Zhuohan Li
|
552cac95b5
|
[Misc] Fix wrong comment in scheduler (#28880)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-11-17 15:32:22 -08:00 |
|
Bangsheng Tang
|
61485844fc
|
[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 (#28774)
|
2025-11-17 15:22:11 -08:00 |
|
Pranav
|
f77bce001a
|
[Model] Add Afmoe architecture implementation (#28332)
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
|
2025-11-17 15:11:20 -08:00 |
|
Wentao Ye
|
a289cc1dde
|
[Test] Batch Invariant: Rename and organize tests (#27421)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 18:09:47 -05:00 |
|
Shreyas Kulkarni
|
95ae50b7d1
|
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
|
2025-11-17 15:01:34 -08:00 |
|
Nick Hill
|
7765e5ba75
|
[BugFix] Fix PP performance and PP kv connector output regression (#28768)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 14:08:50 -08:00 |
|
Ronald
|
d8874c61a5
|
[Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-17 12:16:20 -08:00 |
|
Zhewen Li
|
f8b19c0ffd
|
[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-17 13:15:26 -05:00 |
|
tiehexue
|
e42bd8c2e3
|
Cast return value to int64_t for cache size (#28814)
Signed-off-by: tiehexue <tiehexue@hotmail.com>
|
2025-11-17 16:02:32 +00:00 |
|
Roger Wang
|
7f064491f8
|
[Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models (#28858)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-17 14:49:25 +00:00 |
|
Lucas Wilkinson
|
64e39d667c
|
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-17 09:41:22 -05:00 |
|
Kunshang Ji
|
1b82fb0ad3
|
[XPU] work around for sp, avoid custom op import error (#28822)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-17 13:16:44 +00:00 |
|
Jae-Won Chung
|
d4acf518d0
|
[Metrics] Fix KV cache usage percent metric multiproc (#28792)
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning
```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```
The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>
|
2025-11-17 09:54:15 +00:00 |
|
wuyaoxuehun
|
ab01cd14e5
|
[BugFix] Fix glm4_moe_mtp load weights bug (#28805)
Signed-off-by: wuyaoxuehun <798143193@qq.com>
|
2025-11-17 17:13:11 +08:00 |
|
Li, Jiang
|
577bb34fff
|
[CPU][Bugfix] Fix _to_list in CPU model runner (#28824)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-17 07:47:24 +00:00 |
|
Jee Jee Li
|
3380ed5e11
|
[Doc] Add llama4 LoRA tag (#28825)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-17 14:08:48 +08:00 |
|
Jay Caldwell
|
6f37419244
|
[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode (#28543)
Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>
|
2025-11-17 13:54:46 +08:00 |
|
Xiake Sun
|
60e089f0b9
|
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
|
2025-11-16 20:52:11 -08:00 |
|
liuzhenwei
|
d64429bb36
|
[NIXL][XPU] update install script of NIXL (#28778)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2025-11-17 03:01:33 +00:00 |
|
jiahanc
|
561253b37f
|
[Performance][Fix] update nvfp4 code to support renorm routing (#28569)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-16 18:02:42 -08:00 |
|
Nick Hill
|
80b6080ddc
|
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 06:46:46 +08:00 |
|
amirkl94
|
03ee48111d
|
Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261)
|
2025-11-16 13:39:44 -05:00 |
|
Lukas Geiger
|
5a87076d6e
|
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation (#28769)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-16 17:37:15 +00:00 |
|
Ning Xie
|
ac1daf3233
|
fix comment typo (#28802)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-16 17:03:21 +00:00 |
|
Didier Durand
|
63fed55506
|
[Doc]: fix typos in various files (#28811)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-16 14:30:06 +00:00 |
|
Anna Shors
|
8d259fad6c
|
Fix gpt oss weight loading with EP + bf16 (#28765)
Signed-off-by: ashors1 <ashors@nvidia.com>
|
2025-11-16 13:12:45 +00:00 |
|
scottzh8
|
3bc1175798
|
[Bugfix] Fix host and port join for ipv6 in bench serve (#28679)
Signed-off-by: Scott Zhang <scottzh@fb.com>
Co-authored-by: Scott Zhang <scottzh@fb.com>
|
2025-11-16 10:20:57 +00:00 |
|
Dezhan
|
af02c40970
|
Fixed gpt-oss _load_weights_other() parameter position bug (#28715)
Co-authored-by: Dezhan Tu <dztu@meta.com>
|
2025-11-16 09:46:29 +00:00 |
|
Lucia Fang
|
b316ac6589
|
[V1] Support MP Executor for multi node distributed inference (#23691)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-16 09:01:21 +00:00 |
|
wang.yuqi
|
a55b64635c
|
[Model] Allow users to control skip reading cache per request. (#28194)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-11-16 00:04:50 -08:00 |
|