biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Dazhi Jiang	bcb6f5947f	[Perf] Remove sync point in vit torch sdpa attn backend (#30232 ) Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>	2025-12-08 07:12:42 +00:00
Zhiyu	cd00c443d2	[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-12-08 07:05:27 +00:00
Jiangyun Zhu	d143271234	[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-12-08 06:43:47 +00:00
Zhiwei	c6df05ebb4	[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel (#29773 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-08 05:23:46 +00:00
Nick Hill	d726a7b0ed	[BugFix] Unblock use of LoRA with data parallel mode (#30220 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-08 12:21:05 +08:00
Zhijian Jiang	344b50d525	Address comment to mergify.yml in #30117 (#30219 ) Signed-off-by: Zhijian Jiang <Zhijian.Jiang@outlook.com>	2025-12-08 11:26:25 +08:00
Andrew Xia	735284ed86	[responsesAPI][7] Browser, Container MCP tools for non harmony models (#29989 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-08 10:04:03 +08:00
daniel-salib	444f0e3f33	[Frontend] Add MCP type support infrastructure to Responses API (#30054 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2025-12-08 10:02:52 +08:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Lucas Wilkinson	0044c4038c	[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195 )	2025-12-07 10:53:51 -05:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Wentao Ye	541a2ef892	[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 20:31:14 +08:00
Jee Jee Li	b0f4866a77	[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 20:27:11 +08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Yifan Qiao	1b0482b9d1	[Misc][Core] Remove unused `req_index` increment in scheduler (#30176 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-07 08:39:21 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Luke	a49d813fa8	Lazy loading to avoid importing all files (#29716 ) Signed-off-by: Luke <yq0536@gmail.com>	2025-12-07 07:13:14 +00:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
jeremyteboul	dce6d229f7	Support multiple image/audio embeddings per requests (#29988 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-07 04:34:24 +00:00
Yanan Cao	cbedb703cc	[Frontend] Remove confusing -O.xx flag error (#30169 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-07 02:53:42 +00:00
AuruTus	8d3da4c79d	[MISC]: change NIXL compatibility hash logging level to debug (#30182 )	2025-12-07 00:21:03 +00:00
Andrew Xia	421125d03a	[ez] move harmony utils to parser folder (#30117 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-06 17:34:34 -05:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
Viacheslav	21bb323542	Gigachat 3 tool parser and tests (#29905 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha	17a9abec2b	simplify requires_files list creation (#29656 ) Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>	2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi	92c35abb24	[Misc] Fix circular import in vllm.transformers_utils.config (#30179 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-06 09:24:03 +00:00
Yu Jiaqi	43e7593031	Support tokenization_kwargs override (#29794 ) Signed-off-by: piood <2477084691@qq.com>	2025-12-06 09:12:53 +00:00
Cyrus Leung	c46b932df2	[Chore] Deprecate `SupportsMultiModal.merge_by_field_config` (#30170 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 07:57:28 +00:00
redwrasse	6476382384	prefix caching design doc sha256 now default (#29261 ) Signed-off-by: redwrasse <mail@redwrasse.io>	2025-12-06 07:39:56 +00:00
kx	d6aeaddf4a	[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051 ) Signed-off-by: 01267596 <xiongkai123@cmbchina.com> Co-authored-by: 01267596 <xiongkai123@cmbchina.com>	2025-12-06 07:11:31 +00:00
Woosuk Kwon	a238cbd89d	[Model Runner V2] Support min-p sampling (#30171 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-12-05 21:42:47 -08:00
Nick Hill	4026ae31e9	[Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 20:59:04 -08:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Peter Salas	e858bc4d14	[Model] Add support for transformer-based Ultravox v0.7 projector (#30089 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2025-12-05 20:55:43 -08:00
Dongjie Zou	e3fbb6f152	fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-05 20:55:09 -08:00
yuttian1	c4d62618ca	Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102 ) Signed-off-by: yuttian1 <yuttian@amd.com>	2025-12-05 20:54:38 -08:00
rasmith	62079d8600	[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-06 12:54:17 +08:00
Harry Mellor	bf4a901af9	Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 20:53:52 -08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
rasmith	dc839ad03d	[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 20:52:11 -08:00
Deboleina	02a4169193	[Tests] Tool call tests for openai/gpt-oss-20b (#26237 ) Signed-off-by: Debolina Roy <debroy@redhat.com>	2025-12-05 19:03:29 -08:00
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	e23ca3a0e8	[CI] Re-use whisper_client for all tests (#30148 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:47:37 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Nicolò Lucchesi	bff78310d9	[Enc-Dec] Fix OOT tokenizer issue (#30144 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:23:33 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00

... 46 47 48 49 50 ...

14386 Commits