biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Isotr0py	b0906d8b02	[MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU (#36472 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-09 03:43:44 -07:00
Cyrus Leung	f96c3ab08c	[Deprecation][1/2] Remove items deprecated in v0.18 (#36470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 03:43:23 -07:00
Xin Yang	dc6b578466	[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-08 23:41:01 -07:00
liuzhenwei	1bc9c77f6d	[XPU] Add test script of PD disaggregation (#36434 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-09 05:50:27 +00:00
Alex Brooks	65a4da1504	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-09 05:46:23 +00:00
wang.yuqi	fff3711a24	[Frontend][2/n] Improve pooling entrypoints \| embed. (#36110 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2026-03-09 11:42:19 +08:00
wang.yuqi	dcf8862fd4	[Examples][1/n] Resettle basic examples. (#35579 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:22:53 -07:00
Jiangyun Zhu	e5ff140216	[cudagraph] fix cudagraph warning in deepseekv32 (#28044 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-08 20:27:41 -04:00
danisereb	0a6a3a1290	Add support for ModelOpt MXFP8 MoE models (#35986 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-03-08 13:00:05 -07:00
Andreas Karatzas	40077ea3de	[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-08 14:42:24 +08:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
Micah Williamson	ee54f9cdb9	[ROCm][CI] Accept Different But Valid Output for `test_olmoe_tp` (#35224 )	2026-03-07 13:50:52 -08:00
Micah Williamson	fc4657756f	[ROCm][CI] Enable AITER for failing `test_gpt_oss` test case on MI355 (#36174 )	2026-03-07 13:50:17 -08:00
qli88	eebd14651f	[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416 )	2026-03-07 13:49:56 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
milesial	755356b3d1	feat: expose media_io_kwargs at runtime (#34778 ) Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>	2026-03-07 04:27:04 +00:00
Andreas Karatzas	58928475e4	[ROCm][CI] Making entrypoints more deterministic on ROCm (#36293 )	2026-03-06 19:04:40 -08:00
Alexei-V-Ivanov-AMD	225d1090a0	Enabling some B200-specific tests on MI355 (#35253 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>	2026-03-06 19:27:20 +00:00
eellison	f3c6c9c9d7	[CustomOp] CustomOp FusedRMSNormGated (#35877 ) Signed-off-by: Elias Ellison <elias.ellison@gmail.com> Signed-off-by: eellison <elias.ellison@gmail.com>	2026-03-06 10:53:37 -08:00
Isotr0py	e4ae148a78	[Refactor] Modular video loader backend refactoring (#35202 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-06 06:06:59 -08:00
Isotr0py	1d0c0d209c	[Misc] Lazy import registered processors (#36024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-06 06:06:45 -08:00
Chenguang Zheng	fcb73f306c	[bugfix] add api process rank in default multimodal request (#36150 ) Signed-off-by: fake0fan <645327136@qq.com> Signed-off-by: Chenguang ZHENG <645327136@qq.com>	2026-03-06 12:00:09 +00:00
Harry Mellor	e2090bf3af	[CI] Fix startup error test (#36230 ) A change in engine startup error messages in #35478 caused this test failure. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-06 11:50:28 +00:00
Alex Brooks	10f4db4dbe	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153 ) Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-06 01:16:56 -08:00
Nicolò Lucchesi	5b3ba94ab4	[Core][KVConnector] Support HMA+NixlConnector (#35758 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-06 08:51:21 +01:00
zhanqiuhu	90f3c01fa4	[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158 ) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Zhanqiu Hu <zh338@cornell.edu> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-06 08:50:44 +01:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Walter Beller-Morales	43e77e59ab	[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-05 22:15:29 -08:00
Ajay Anubolu	43f10573c9	[Bugfix] Fix misleading context length error messages (#36197 ) Signed-off-by: AjAnubolu <anuboluajay@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 22:15:12 -08:00
Yongye Zhu	86e1060b17	[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-03-05 22:04:44 -08:00
Mark McLoughlin	27066d1b2b	[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-05 22:04:31 -08:00
cong-or	57c84ff129	perf: add __slots__ to KVCacheBlock (#36164 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-05 22:04:09 -08:00
Andreas Karatzas	a1ffa56a1e	[CI] Fix bge-m3 similarity reference values after Defination typo fix (#36208 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 05:07:29 +00:00
Shiyan Deng	8e87cc57f1	[Bug] Fix a corner case in _process_simple_streaming_events (#34754 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-03-05 20:57:32 -08:00
Cyrus Leung	6dd302653f	[Misc] Rename `group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs` (#36158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-06 12:32:48 +08:00
Zhengxu Chen	a97954b6a8	[compile] Consistent compiler config for saved/loaded vllm backends. (#35810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 15:08:12 -05:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Cyrus Leung	7196348157	[Bugfix] Fix Qwen-VL tokenizer implementation (#36140 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 08:07:19 -08:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Or Ozeri	612e7729c2	[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-05 14:25:15 +00:00
Andreas Karatzas	b03ff6a96b	[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-05 21:52:49 +08:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Isotr0py	21eb2c3372	[Chore] Correct MTP models test registry ordering (#36115 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:55:04 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
Zhengxu Chen	dd6dbd93f8	[compile] Fix extra cache save on warm start. (#35921 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 12:56:30 +08:00
daje0601	3b23d57c96	[Model] Add LoRA support for Whisper models (#29856 ) Signed-off-by: daje0601 <englishmt4118@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-05 10:38:25 +08:00

... 2 3 4 5 6 ...

4859 Commits