biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
chengchengpei	965525667b	Onboard voyage-4-nano (#33720 ) Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-06 06:23:34 +00:00
Mingliang Li	a32cb49b60	feat(frontend): early-fail tokenization guard for user requests (#31366 ) Signed-off-by: limingliang <limingliang@stepfun.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: limingliang <limingliang@stepfun.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-05 19:38:02 -08:00
Xin Yang	79028d4388	[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568 )	2026-02-05 20:34:00 -05:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Hashem Hashemi	d5c4800112	Adds padding and perf improvements to wvSplitK_fp8 (#33527 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-05 22:16:02 +00:00
Cyrus Leung	116880a5a0	[Bugfix] Make MM batching more robust (#33817 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-05 20:40:58 +00:00
Harry Mellor	1887acca9e	Fix tokenizer test for renamed attr on Transformers v5 (#33902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-05 19:16:20 +00:00
bnellnm	a57c8228ff	[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-05 18:07:18 +00:00
Benjamin Chislett	af3162d3aa	[Spec Decode] Unified Parallel Drafting (#32887 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-05 12:37:18 -05:00
Aaron Hao	c1858b7ec8	[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-05 12:13:23 -05:00
Mario Hong	82914d2ae8	[Bugfix] Fix step3p5 parser when using mtp (#33690 ) Signed-off-by: mariohong <mariohong128@gmail.com>	2026-02-05 16:04:04 +00:00
wang.yuqi	1c3a221d3b	[Bugfix] Fix corner case of sparse embedding (#33886 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-05 02:51:22 -08:00
liranschour	8322d4e47f	Enable Cross layers KV cache layout at NIXL Connector V2 (#33339 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-05 02:17:02 -08:00
Andreas Karatzas	3e472e81f9	[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-02-05 10:01:23 +00:00
Cyrus Leung	038914b7c8	[Refactor] Move `task` outside of `PoolingParams.verify` (#33796 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-05 09:33:11 +00:00
Mark McLoughlin	2abd97592f	[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-05 09:57:27 +02:00
Andreas Karatzas	1f70313e59	[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-05 06:17:00 +00:00
rasmith	c1395f72cd	[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-05 05:05:48 +00:00
Nick Hill	add9f1fbd9	[Minor] Include `StreamingInput` in inputs package (#33856 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-05 04:38:20 +00:00
Andreas Karatzas	fb1270f1f8	[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-05 11:14:06 +08:00
Luka Govedič	4d9513537d	[CI][torch.compile] Reduce e2e fusion test time (#33293 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-04 19:09:03 -05:00
Ilya Boytsov	439afa4eea	feat: Add ColBERT late interaction model support (#33686 ) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-05 08:05:13 +08:00
Nick Hill	fa4e0fb028	[Core] Don't schedule spec tokens with prefill chunks (#33652 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-04 23:40:22 +00:00
Richard Zou	9f14c9224d	Revert "[torch.compile] Significantly speed up cold start times" (#33820 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-04 21:59:59 +00:00
Simon Danielsson	4292c90a2a	[Bugfix] Support `RotaryEmbedding` CustomOp for gpt-oss (#33800 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2026-02-04 20:17:41 +00:00
kourosh hakhamaneshi	2f6d17cb2f	[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2026-02-04 10:09:14 -08:00
Isotr0py	192ad4648b	[Bugfix] Fix interns1-pro initialization and PP (#33793 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-04 17:54:45 +00:00
Cyrus Leung	80f921ba4b	[Bugfix] Fix `normalize` still being passed to `PoolerConfig` (#33794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-04 23:56:02 +08:00
Or Ozeri	8e32690869	[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255 ) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-04 11:16:34 +00:00
zhanqiuhu	4403e3ed4c	[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290 ) Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-02-04 07:46:48 +00:00
Frank Wang	45f8fd6f97	[Feature] Enable `TRITON_ATTN` for Batch Invariance (#33688 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-02-04 13:27:34 +08:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00
Andrew Xia	e1bf04b6c2	[1/N] Initial Implementation of Parser for ResponsesAPI (#32712 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-02-04 10:59:03 +08:00
Isotr0py	02080179a3	[Bugfix] Fix torchrun PP broadcast deadlock with async scheduling (#33701 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-04 02:17:37 +00:00
wang.yuqi	1b8fe6f7c4	[Frontend][4/n] Make pooling entrypoints request schema consensus \| ScoreRequest (#33060 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-04 01:48:40 +00:00
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Patrick von Platen	3f7662d650	[Voxtral Realtime] Change name (#33716 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-03 13:03:28 -08:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
shaharmor98	4bc913aeec	Feat/add nemotron nano v3 tests (#33345 )	2026-02-03 08:52:49 -05:00
zxy	a3acfa1071	[Models] Intern-S1-Pro (#33636 ) Signed-off-by: zxy <zhou0493@e.ntu.edu.sg> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 05:49:45 -08:00
Harry Mellor	f6af34626d	Fix offline test for Transformers v5 (#33682 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:07:24 +00:00
Cyrus Leung	83449a5ff0	[Refactor] Clean up pooling serial utils (#33665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 10:29:18 +00:00
Isotr0py	32e84fa1ff	[CI/Build] Investigate torchrun distributed tests hanging issue (#33650 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 15:49:17 +08:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
Daniel Mescheder	4c4b6f7a97	[Frontend] Add sampling parameters to Responses API (#32609 ) Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>	2026-02-03 13:51:10 +08:00
Patrick von Platen	5019c59dd2	[Voxtral Realtime] Introduce global log mel max (#33574 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 17:01:47 -05:00
Vasiliy Kuznetsov	0130223bd9	fix memory for online fp8 quantization with streaming weight load (#31914 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2026-02-02 14:17:42 -05:00
yugong333	ffe1fc7a28	Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-02-02 12:30:06 -05:00
Harry Mellor	6141ebe0dd	Remove incorrect tokenizer info test (#33565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 17:11:44 +00:00

... 4 5 6 7 8 ...

4625 Commits