biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Or Ozeri	8e32690869	[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255 ) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-04 11:16:34 +00:00
Zhengxu Chen	a208439537	[compile] Remove runner type from ignored caching factor list. (#33712 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-04 10:56:45 +00:00
Zhengxu Chen	bcd2f74c0d	[compile] Clean up AOT compile bypass on evaluate_guards. (#33578 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-04 02:12:53 -08:00
Kunshang Ji	f79f777803	[XPU][2/N] add support unquantized moe support for xpu (#33659 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-04 02:12:25 -08:00
Augusto Yao	4c8d1bf361	use ORJSONResponse when available to improve the efficiency of request process (#33548 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>	2026-02-04 10:04:11 +00:00
Kunshang Ji	061da6bcf7	[XPU] remove common path warning log (#33769 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-04 16:40:17 +08:00
zhanqiuhu	4403e3ed4c	[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290 ) Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-02-04 07:46:48 +00:00
Matt	08e094997e	[Hardware][AMD][CI] Refactor AMD tests to properly use BuildKite parallelism (#32745 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-02-04 14:51:33 +08:00
Wentao Ye	d88a1df699	[Deprecation] Deprecate profiling envs (#33722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-04 05:58:21 +00:00
Cyrus Leung	90d74ebaa4	[Deprecation] Remove `_get_data_parser` in MM processor (#33757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-04 05:51:52 +00:00
Frank Wang	45f8fd6f97	[Feature] Enable `TRITON_ATTN` for Batch Invariance (#33688 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-02-04 13:27:34 +08:00
Wentao Ye	5e1e0a0fbd	[Refactor] Remove unused dead code (#33718 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-03 21:25:11 -08:00
Michael Goin	eb5ed20743	[Bugfix] Define router_logits_dtype for remaining MoE models (#33737 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-04 13:24:14 +08:00
Huy Do	2647163674	Save startup benchmark results as a list of values (#33629 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-02-03 20:37:51 -08:00
Shanshan Shen	9fb27dd3b3	[MM] Align the prefix of MMEncoderAttention with Attention (#33750 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-02-04 04:07:30 +00:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00
Andrew Xia	e1bf04b6c2	[1/N] Initial Implementation of Parser for ResponsesAPI (#32712 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-02-04 10:59:03 +08:00
Isotr0py	02080179a3	[Bugfix] Fix torchrun PP broadcast deadlock with async scheduling (#33701 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-04 02:17:37 +00:00
wang.yuqi	1b8fe6f7c4	[Frontend][4/n] Make pooling entrypoints request schema consensus \| ScoreRequest (#33060 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-04 01:48:40 +00:00
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Wentao Ye	655efb3e69	[Dependency] Remove comments of ray in dependency files (#33351 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-03 15:30:47 -08:00
Matthew Bonanni	bd8da29a66	[Bugfix] Fix sparse MLA metadata building (#33579 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-03 15:29:48 -08:00
Michael Goin	2a99c5a6c8	[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-03 13:26:51 -08:00
Patrick von Platen	3f7662d650	[Voxtral Realtime] Change name (#33716 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-03 13:03:28 -08:00
Vadim Gimpelson	a372f3f40a	[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-03 15:10:31 -05:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
Lucas Wilkinson	2267cb1cfd	[Attention][FA3] Update FA3 to include new swizzle optimization (#23465 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-03 08:08:47 -08:00
dtc	0d6ccf68fa	[P/D] rework mooncake connector and introduce its bootstrap server (#31034 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-03 08:08:25 -08:00
Cyrus Leung	18e7cbbb15	[Bugfix] Fix startup hang for Granite Speech (#33699 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 15:57:56 +00:00
Patrick von Platen	f0d5251715	[Voxtral models] Skip warm-up to skip confusing error message in warm-up (#33576 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-03 07:22:34 -08:00
Shanshan Shen	5c4f2dd6ef	[MM] Pass `prefix` parameter to MMEncoderAttention (#33674 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-02-03 06:47:41 -08:00
wang.yuqi	f3d8a34671	[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-03 06:43:47 -08:00
shaharmor98	4bc913aeec	Feat/add nemotron nano v3 tests (#33345 )	2026-02-03 08:52:49 -05:00
Kuntai Du	fbb3cf6981	[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2026-02-03 21:50:15 +08:00
Krish Gupta	2df2b3499d	Document NixlConnector backend selection via kv_connector_extra_config (#33552 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-03 05:49:59 -08:00
Harry Mellor	2a8d84e66d	Fix Gemma3n audio encoder for Transformers v5 (#33673 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 05:49:49 -08:00
zxy	a3acfa1071	[Models] Intern-S1-Pro (#33636 ) Signed-off-by: zxy <zhou0493@e.ntu.edu.sg> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 05:49:45 -08:00
Harry Mellor	be8168ff88	Fix Gemma3 GGUF for Transformers v5 (#33683 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:36:53 +00:00
Harry Mellor	f6af34626d	Fix offline test for Transformers v5 (#33682 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:07:24 +00:00
Song Zhixin	ceab70c89d	[Bugfix] fix qwen3-asr response error (#33644 ) Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-03 03:33:56 -08:00
Cyrus Leung	52683ccbe1	[Misc] Update default image format of `encode_base64` (#33656 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 03:13:16 -08:00
Michael Goin	e346e2d056	[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-03 10:37:15 +00:00
Cyrus Leung	83449a5ff0	[Refactor] Clean up pooling serial utils (#33665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 10:29:18 +00:00
Lucas Hänke de Cansino	dad2d6a590	[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642 ) Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de>	2026-02-03 00:35:58 -08:00
Isotr0py	32e84fa1ff	[CI/Build] Investigate torchrun distributed tests hanging issue (#33650 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 15:49:17 +08:00
Richard Zou	fd9c83d0e0	[torch.compile] Document the workaround to standalone_compile failing (#33571 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 07:16:55 +00:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
Nick Hill	61397891ce	[Minor] Some code simplification in `scheduler.py` (#33597 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 15:00:00 +08:00
杨朱 · Kiki	ef248ff740	[Misc] Remove deprecated profiler environment variables (#33536 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:58:44 +08:00

1 2 3 4 5 ...

13592 Commits