biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Alex Brooks	10f4db4dbe	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153 ) Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-06 01:16:56 -08:00
Nicolò Lucchesi	5b3ba94ab4	[Core][KVConnector] Support HMA+NixlConnector (#35758 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-06 08:51:21 +01:00
zhanqiuhu	90f3c01fa4	[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158 ) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Zhanqiu Hu <zh338@cornell.edu> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-06 08:50:44 +01:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Walter Beller-Morales	43e77e59ab	[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-05 22:15:29 -08:00
Ajay Anubolu	43f10573c9	[Bugfix] Fix misleading context length error messages (#36197 ) Signed-off-by: AjAnubolu <anuboluajay@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 22:15:12 -08:00
Yongye Zhu	86e1060b17	[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-03-05 22:04:44 -08:00
Mark McLoughlin	27066d1b2b	[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-05 22:04:31 -08:00
cong-or	57c84ff129	perf: add __slots__ to KVCacheBlock (#36164 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-05 22:04:09 -08:00
Andreas Karatzas	a1ffa56a1e	[CI] Fix bge-m3 similarity reference values after Defination typo fix (#36208 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 05:07:29 +00:00
Shiyan Deng	8e87cc57f1	[Bug] Fix a corner case in _process_simple_streaming_events (#34754 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-03-05 20:57:32 -08:00
Cyrus Leung	6dd302653f	[Misc] Rename `group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs` (#36158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-06 12:32:48 +08:00
Zhengxu Chen	a97954b6a8	[compile] Consistent compiler config for saved/loaded vllm backends. (#35810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 15:08:12 -05:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Cyrus Leung	7196348157	[Bugfix] Fix Qwen-VL tokenizer implementation (#36140 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 08:07:19 -08:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Or Ozeri	612e7729c2	[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-05 14:25:15 +00:00
Andreas Karatzas	b03ff6a96b	[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-05 21:52:49 +08:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Isotr0py	21eb2c3372	[Chore] Correct MTP models test registry ordering (#36115 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:55:04 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
Zhengxu Chen	dd6dbd93f8	[compile] Fix extra cache save on warm start. (#35921 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 12:56:30 +08:00
daje0601	3b23d57c96	[Model] Add LoRA support for Whisper models (#29856 ) Signed-off-by: daje0601 <englishmt4118@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-05 10:38:25 +08:00
Simon Mo	f678c3f61a	[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-03-04 17:05:32 -05:00
Harry Mellor	17dc9c7fc9	[CI] Bump `mypy` version (#34950 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 20:55:11 +00:00
Richard Zou	5569f5218d	[torch.compile] Stop lazily compiling (#35472 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-04 12:13:17 -08:00
Stefano Castagnetta	d7166e74c1	[CI] Add Blackwell AsyncTP correctness test (#35871 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>	2026-03-04 19:41:21 +00:00
Hyunkyun Moon	bc6be89d16	[Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551 ) Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>	2026-03-04 18:41:52 +00:00
Bhuminjay Soni	fb3e78ab09	[Feature][CI]: compare `func` & `no_func` outputs in test_functionalization.py (#35481 ) Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com> Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-04 18:01:16 +00:00
Christian Pinto	2f2212e6cc	Split generic IO Processor plugins tests from Terratorch specific ones (#35756 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com>	2026-03-05 00:01:03 +08:00
Nicolò Lucchesi	18e01a0a10	[Misc] Add `--attention-backend auto` option (#35738 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-04 15:12:27 +00:00
sungsoo ha	6cb901093f	[Core] Add All-to-All communication backend for DCP (#34883 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: sungsoo ha <hasungsoo@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:01:57 -05:00
Qi Wang	6aa6ad8992	[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-03-04 15:01:30 +01:00
Raghavan	c8c3935b70	[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656 ) Signed-off-by: raghavan <oneraghavan@gmail.com>	2026-03-04 13:15:38 +00:00
Ronen Schaffer	bb6888b8b1	[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC `prepare_store()` (#35846 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-04 14:25:33 +02:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Joe Runde	6f0dd93801	[Core] Remove busy loop from idle buffer readers (#28053 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-04 07:44:20 +00:00
Cyrus Leung	e379396167	[Refactor] Clean up processor kwargs extraction (#35872 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-03 19:53:53 -08:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
William Zhang	70c73df69e	[Bugfix] Fix EVS implementation for Qwen3 VL (#33607 ) Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>	2026-03-04 02:18:11 +00:00
Micah Williamson	e7213003cb	[ROCm][CI] Fix TP size issue for `test_gpt_oss` (#35887 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-03 20:57:34 +00:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Robert Shaw	881a6b011b	[CI] Temporarily Disable Llama4 MoE Refactor Test (#35870 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-03-03 10:36:15 -08:00
Matthew Bonanni	8e1fd5baf0	[CI] Bump `num_speculative_tokens` to 3 in nightly DeepSeek tests (#35882 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 09:26:44 -08:00
JasonCohere	ae88468bcc	fix: Ensure invalid audio files return 400 error (#34715 ) Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-03 08:47:39 -08:00
ojhaanshika	e05cb3b93e	TRTLLM gen-full attn Test Coverage (#34986 ) Signed-off-by: Anshika Ojha <anshikao@nvidia.com> Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>	2026-03-03 11:35:34 -05:00
Lucas Wilkinson	28ef9ba399	[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 07:21:57 -08:00

1 2 3 4 5 ...

4684 Commits