biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	80f921ba4b	[Bugfix] Fix `normalize` still being passed to `PoolerConfig` (#33794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-04 23:56:02 +08:00
Or Ozeri	8e32690869	[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255 ) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-04 11:16:34 +00:00
zhanqiuhu	4403e3ed4c	[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290 ) Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-02-04 07:46:48 +00:00
Frank Wang	45f8fd6f97	[Feature] Enable `TRITON_ATTN` for Batch Invariance (#33688 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-02-04 13:27:34 +08:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00
Andrew Xia	e1bf04b6c2	[1/N] Initial Implementation of Parser for ResponsesAPI (#32712 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-02-04 10:59:03 +08:00
Isotr0py	02080179a3	[Bugfix] Fix torchrun PP broadcast deadlock with async scheduling (#33701 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-04 02:17:37 +00:00
wang.yuqi	1b8fe6f7c4	[Frontend][4/n] Make pooling entrypoints request schema consensus \| ScoreRequest (#33060 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-04 01:48:40 +00:00
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Patrick von Platen	3f7662d650	[Voxtral Realtime] Change name (#33716 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-03 13:03:28 -08:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
shaharmor98	4bc913aeec	Feat/add nemotron nano v3 tests (#33345 )	2026-02-03 08:52:49 -05:00
zxy	a3acfa1071	[Models] Intern-S1-Pro (#33636 ) Signed-off-by: zxy <zhou0493@e.ntu.edu.sg> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 05:49:45 -08:00
Harry Mellor	f6af34626d	Fix offline test for Transformers v5 (#33682 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:07:24 +00:00
Cyrus Leung	83449a5ff0	[Refactor] Clean up pooling serial utils (#33665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 10:29:18 +00:00
Isotr0py	32e84fa1ff	[CI/Build] Investigate torchrun distributed tests hanging issue (#33650 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 15:49:17 +08:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
Daniel Mescheder	4c4b6f7a97	[Frontend] Add sampling parameters to Responses API (#32609 ) Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>	2026-02-03 13:51:10 +08:00
Patrick von Platen	5019c59dd2	[Voxtral Realtime] Introduce global log mel max (#33574 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 17:01:47 -05:00
Vasiliy Kuznetsov	0130223bd9	fix memory for online fp8 quantization with streaming weight load (#31914 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2026-02-02 14:17:42 -05:00
yugong333	ffe1fc7a28	Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-02-02 12:30:06 -05:00
Harry Mellor	6141ebe0dd	Remove incorrect tokenizer info test (#33565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 17:11:44 +00:00
Matthew Bonanni	9f8cb81b44	[CI] Add DeepSeek V3.2 nightly eval (#33566 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 16:10:02 +00:00
shanjiaz	d95b4be47a	move spec decode slow test to test_areas.yaml (#33365 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>	2026-02-02 06:28:36 -08:00
Isotr0py	4061dcf4c5	[Bugfix] Enable Kimi k25 processor test (#33562 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 14:25:25 +00:00
danielafrimi	0aca8b8c62	[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-02-02 09:18:50 -05:00
Nicolò Lucchesi	528b3076af	[CI][Bugfix] Fix flaky `tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency` (#33555 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-02 03:01:29 -08:00
Cyrus Leung	a502831d36	[Chore] Remove redundant input parsing methods (#33542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 10:50:47 +00:00
RED	808dd87b30	[Model] Support DeepSeek-OCR-2 (#33165 ) Signed-off-by: liuli <ll407707@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: liuli <ll407707@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 06:24:10 +00:00
jack	7c036432fc	[Bugfix] GLM-4 tool parser: incremental string streaming (#33218 ) Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>	2026-02-02 11:13:31 +08:00
Robert Shaw	318b120766	[Nightly CI] Remove CT Model (#33530 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-01 19:09:09 -08:00
csy0225	c3b40dc3e7	[Models] Step-3.5-Flash (#33523 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-02 10:21:18 +08:00
Yifan Qiao	a01ef3fa51	[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-02-02 01:59:58 +00:00
Runkai Tao	7320ca3942	Add unpermute-aware fused MoE LoRA path (#32655 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-02 09:46:09 +08:00
Roy Wang	63c0889416	[Misc] Fix flashinfer related tests (#33462 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-31 16:10:24 -05:00
Cyrus Leung	88c3e114d8	[Refactor] Move MM data parsing outside processor (#33408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 16:46:14 +00:00
jma99_2333	22d9a056d5	Support clear mm and encoder cache (#33452 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-31 15:22:25 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	13b842f271	[BugFix][Router Replay] Capture Logical Experts with EPLB (#33013 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2026-01-31 10:12:17 -05:00
Luka Govedič	15f40b20aa	[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com>	2026-01-31 06:48:34 -08:00
Angela Yi	608b556507	[ez] Add structured torch.compile logs (#33213 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-31 21:00:54 +08:00
Cyrus Leung	f0a1c8453a	[Frontend] Use new Renderer for Completions and Tokenize API (#32863 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 04:51:15 -08:00
Yanan Cao	d5c41db35b	[Kernel] [Helion] [3/N] Helion kernel registry (#33203 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-31 15:38:46 +08:00
Dimitrios Bariamis	f0bca83ee4	Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-30 22:48:27 -08:00
Yanan Cao	8ecd213c0b	[Kernel] [Helion] [2/N] Helion kernel wrapper (#32964 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-31 12:53:01 +08:00
Patrick von Platen	15e0bb9c42	[Streaming -> Realtime] Rename all voxtral related classes, fn, files (#33415 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-01-31 04:49:00 +00:00
Micah Williamson	6c64c41b4a	[ROCm][CI] Force max_num_seqs=1 on ROCm In test_sharded_state_loader to reduce flakiness (#33277 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-31 12:28:29 +08:00
Michael Goin	29fba76781	[UX] Use gguf `repo_id:quant_type` syntax for examples and docs (#33371 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-31 12:14:54 +08:00
Nick Hill	876a16f4fb	[ModelRunner V2] Fix spec decoding + logprobs (#33391 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-31 03:33:26 +00:00
Matthew Bonanni	aaa901ad55	[Attention] Move MLA `forward` from backend to layer (#33284 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-30 19:30:00 -08:00

1 2 3 4 5 ...

4348 Commits