biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Wentao Ye	655efb3e69	[Dependency] Remove comments of ray in dependency files (#33351 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-03 15:30:47 -08:00
Matthew Bonanni	bd8da29a66	[Bugfix] Fix sparse MLA metadata building (#33579 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-03 15:29:48 -08:00
Michael Goin	2a99c5a6c8	[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-03 13:26:51 -08:00
Patrick von Platen	3f7662d650	[Voxtral Realtime] Change name (#33716 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-03 13:03:28 -08:00
Vadim Gimpelson	a372f3f40a	[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-03 15:10:31 -05:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
Lucas Wilkinson	2267cb1cfd	[Attention][FA3] Update FA3 to include new swizzle optimization (#23465 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-03 08:08:47 -08:00
dtc	0d6ccf68fa	[P/D] rework mooncake connector and introduce its bootstrap server (#31034 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-03 08:08:25 -08:00
Cyrus Leung	18e7cbbb15	[Bugfix] Fix startup hang for Granite Speech (#33699 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 15:57:56 +00:00
Patrick von Platen	f0d5251715	[Voxtral models] Skip warm-up to skip confusing error message in warm-up (#33576 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-03 07:22:34 -08:00
Shanshan Shen	5c4f2dd6ef	[MM] Pass `prefix` parameter to MMEncoderAttention (#33674 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-02-03 06:47:41 -08:00
wang.yuqi	f3d8a34671	[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-03 06:43:47 -08:00
shaharmor98	4bc913aeec	Feat/add nemotron nano v3 tests (#33345 )	2026-02-03 08:52:49 -05:00
Kuntai Du	fbb3cf6981	[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2026-02-03 21:50:15 +08:00
Krish Gupta	2df2b3499d	Document NixlConnector backend selection via kv_connector_extra_config (#33552 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-03 05:49:59 -08:00
Harry Mellor	2a8d84e66d	Fix Gemma3n audio encoder for Transformers v5 (#33673 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 05:49:49 -08:00
zxy	a3acfa1071	[Models] Intern-S1-Pro (#33636 ) Signed-off-by: zxy <zhou0493@e.ntu.edu.sg> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 05:49:45 -08:00
Harry Mellor	be8168ff88	Fix Gemma3 GGUF for Transformers v5 (#33683 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:36:53 +00:00
Harry Mellor	f6af34626d	Fix offline test for Transformers v5 (#33682 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 12:07:24 +00:00
Song Zhixin	ceab70c89d	[Bugfix] fix qwen3-asr response error (#33644 ) Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-03 03:33:56 -08:00
Cyrus Leung	52683ccbe1	[Misc] Update default image format of `encode_base64` (#33656 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 03:13:16 -08:00
Michael Goin	e346e2d056	[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-03 10:37:15 +00:00
Cyrus Leung	83449a5ff0	[Refactor] Clean up pooling serial utils (#33665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 10:29:18 +00:00
Lucas Hänke de Cansino	dad2d6a590	[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642 ) Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de>	2026-02-03 00:35:58 -08:00
Isotr0py	32e84fa1ff	[CI/Build] Investigate torchrun distributed tests hanging issue (#33650 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-03 15:49:17 +08:00
Richard Zou	fd9c83d0e0	[torch.compile] Document the workaround to standalone_compile failing (#33571 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 07:16:55 +00:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
Nick Hill	61397891ce	[Minor] Some code simplification in `scheduler.py` (#33597 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 15:00:00 +08:00
杨朱 · Kiki	ef248ff740	[Misc] Remove deprecated profiler environment variables (#33536 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:58:44 +08:00
Kunshang Ji	e10604480b	[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-02 22:46:10 -08:00
Chauncey	bf001da4bf	[Bugfix] Interleaved thinking keeps compatibility with reasoning_content (#33635 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Koushik Dutta <koushd@gmail.com>	2026-02-03 06:46:05 +00:00
杨朱 · Kiki	a0a984ac2e	[CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles (#33553 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-02 22:32:39 -08:00
Shengliang Xu	f1cb9b5544	Fix quantized Falcon-H1 model loading issues (#32728 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-02 22:31:27 -08:00
Daniel Mescheder	4c4b6f7a97	[Frontend] Add sampling parameters to Responses API (#32609 ) Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>	2026-02-03 13:51:10 +08:00
Roger Wang	10546f925a	[Bugfix] Fix mm budget setting for Qwen Omni models (#33634 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-03 04:56:25 +00:00
Radu Salavat	e69c990c21	[Feature][CPU Backend]: Optimize ARM vectorization backend (#30329 ) Signed-off-by: Radu Salavat <radu.salavat@arm.com>	2026-02-02 20:17:56 -08:00
Richard Zou	5eac9a1b34	[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624 ) Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-03 03:38:49 +00:00
Nathan Weinberg	1b60b45d0d	[CI/Build] add directions for CPU image upload to Docker Hub (#32032 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Signed-off-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2026-02-03 02:48:06 +00:00
Dezhan	4b3803d180	[BugFix] DPMetadata raises assert error for dense model (#32739 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2026-02-03 00:56:44 +00:00
Patrick von Platen	5019c59dd2	[Voxtral Realtime] Introduce global log mel max (#33574 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 17:01:47 -05:00
Lain	089cd4f002	fix cutlass_3x_gemm_fp8_blockwise on sm103a (#32224 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-02 11:47:46 -08:00
Vasiliy Kuznetsov	0130223bd9	fix memory for online fp8 quantization with streaming weight load (#31914 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2026-02-02 14:17:42 -05:00
Matthew Bonanni	5d1aef3004	[UX] Format attention backend log line (#33570 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 18:57:12 +00:00
yugong333	ffe1fc7a28	Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-02-02 12:30:06 -05:00
Harry Mellor	8b7346d5f1	Update huggingface-hub again (#33567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 09:20:54 -08:00
Harry Mellor	6141ebe0dd	Remove incorrect tokenizer info test (#33565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 17:11:44 +00:00
Yang Liu	199e3cb476	[Model] Use mm_position to compute mrope positions for GLM-4.xV (#33039 ) Signed-off-by: Yang <lymailforjob@gmail.com>	2026-02-02 16:55:48 +00:00
Matthew Bonanni	9f8cb81b44	[CI] Add DeepSeek V3.2 nightly eval (#33566 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 16:10:02 +00:00

... 3 4 5 6 7 ...

13773 Commits