biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	be292b7c14	[Bug] Fix pooling model benchmark script (#36300 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 11:17:45 -04:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
Tianyu Guo	5578f2a4d3	Support online use_audio_in_video (#36319 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-09 07:16:44 -07:00
Cyrus Leung	3ec2115015	[Frontend] Move warmup into Renderer (#36482 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 06:03:21 -07:00
Isotr0py	b0906d8b02	[MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU (#36472 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-09 03:43:44 -07:00
Kevin H. Luu	aaf5fa9abf	[ci] Bound openai dependency to 2.24.0 (#36471 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-03-09 03:43:26 -07:00
Cyrus Leung	f96c3ab08c	[Deprecation][1/2] Remove items deprecated in v0.18 (#36470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 03:43:23 -07:00
Xin Yang	dc6b578466	[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-08 23:41:01 -07:00
liuzhenwei	1bc9c77f6d	[XPU] Add test script of PD disaggregation (#36434 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-09 05:50:27 +00:00
Alex Brooks	65a4da1504	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-09 05:46:23 +00:00
Li, Jiang	217f27598d	[Bugfix] Avoid to replace non-tensor members in cpu model runner (#36430 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-09 13:06:28 +08:00
wang.yuqi	fff3711a24	[Frontend][2/n] Improve pooling entrypoints \| embed. (#36110 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2026-03-09 11:42:19 +08:00
Tushar Shetty	c4d859c274	[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243 ) Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com> Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>	2026-03-08 20:40:16 -07:00
cong-or	747431044d	feat(attention): extract KV-cache update from FlexAttention backend (#36263 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-08 20:40:12 -07:00
Cyrus Leung	d62856b928	[Misc] Move processors to `transformers_utils` (#35953 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 11:31:39 +08:00
Alex Brooks	bd2659a566	Increase Flexibility for OOV Multimodal Token Handling (#34858 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-08 20:30:49 -07:00
Shaun Kotek	90512b2e8b	fix: Use iterator as not to store all the file loads in memory at once (#36149 ) Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>	2026-03-08 20:25:21 -07:00
wang.yuqi	dcf8862fd4	[Examples][1/n] Resettle basic examples. (#35579 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:22:53 -07:00
Weiguang Li	43aa389231	[Bugfix] Fix CPU OMP autobind assertion to use local_world_size (#35815 ) Signed-off-by: liweiguang <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-03-08 20:07:29 -07:00
Wentao Ye	384425f84e	[Dependency] Remove default ray dependency (#36170 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-08 20:06:22 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Kunshang Ji	fde4771bbd	[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-09 02:09:22 +00:00
Jiangyun Zhu	e5ff140216	[cudagraph] fix cudagraph warning in deepseekv32 (#28044 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-08 20:27:41 -04:00
danisereb	0a6a3a1290	Add support for ModelOpt MXFP8 MoE models (#35986 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-03-08 13:00:05 -07:00
Sage	4497431df6	[Frontend] Add GPU-less render serving path (`vllm launch render`) (#36166 )	2026-03-08 16:35:09 +01:00
nvnbagrov	b7332b058c	[Model] Nano Nemotron VL - fast media preprocessing (#35657 ) Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>	2026-03-08 03:04:05 -07:00
Andreas Karatzas	40077ea3de	[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-08 14:42:24 +08:00
Samuel Shen	5d6aae4577	[LMCache MP Patch]: Race Condition + Duplicated Block Ids (#35831 )	2026-03-07 13:52:48 -08:00
Roy Huang	63298ee173	[Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode (#35931 )	2026-03-07 13:52:35 -08:00
Richard Zou	2dde535df1	[compile] Split compile/warmup monitoring (#36098 )	2026-03-07 13:52:11 -08:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
Micah Williamson	ee54f9cdb9	[ROCm][CI] Accept Different But Valid Output for `test_olmoe_tp` (#35224 )	2026-03-07 13:50:52 -08:00
Micah Williamson	fc4657756f	[ROCm][CI] Enable AITER for failing `test_gpt_oss` test case on MI355 (#36174 )	2026-03-07 13:50:17 -08:00
qli88	eebd14651f	[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416 )	2026-03-07 13:49:56 -08:00
Matthew Bonanni	ebb9cc5f2b	[UX][Startup] Account for CUDA graphs during memory profiling (#30515 )	2026-03-07 13:49:23 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
Taneem Ibrahim	5261223c2d	[Misc] Remove duplicate parser registration (#36303 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-03-07 09:37:01 -05:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
vllmellm	ee8a29511f	[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-07 09:26:59 +00:00
milesial	755356b3d1	feat: expose media_io_kwargs at runtime (#34778 ) Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>	2026-03-07 04:27:04 +00:00
Andreas Karatzas	58928475e4	[ROCm][CI] Making entrypoints more deterministic on ROCm (#36293 )	2026-03-06 19:04:40 -08:00
Mengtao (Martin) Yuan	1a9718085c	Fix CUDA graph decode capture crash in AITER FlashAttention (#36042 ) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com>	2026-03-06 18:12:07 -08:00
Kunshang Ji	7eb524e64c	refine `vllm bench throughput --backend hf` (#35971 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-07 02:10:33 +00:00
Nick Hill	c7f32e08c2	[BugFix] Avoid ignored trust_remote_code warnings (#36290 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-07 01:24:18 +00:00
Nick Hill	b354686524	[Model Runner V2] Fix warmup for pipeline parallel (#36280 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-06 16:58:51 -08:00
Nick Hill	6a18d8789b	[Core] Fix benign error log during normal shutdown (#36270 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2026-03-07 00:39:21 +00:00
Itay Alroy	24a03915f5	mla: don't update kv cache on dummy forwards (#36282 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com>	2026-03-07 00:36:00 +00:00
Andreas Karatzas	b5e34e1fca	[ROCm][CI] Fixing yaml file for external amd-ci signal (#36284 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 18:30:39 -06:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00

... 9 10 11 12 13 ...

15117 Commits