biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
vllmellm	e27078ea80	[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-14 19:32:48 +00:00
Aleksandr Samarin	d084e9fca7	[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291 ) Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com> Signed-off-by: southfreebird <yvorott@gmail.com> Co-authored-by: southfreebird <yvorott@gmail.com>	2026-01-14 13:20:52 -05:00
qli88	3a612322eb	[CI] Move rixl/ucx from Dockerfile.rocm_base to Dockerfile.rocm (#32295 ) Signed-off-by: Qiang Li <qiang.li2@amd.com>	2026-01-14 16:53:36 +00:00
Cyrus Leung	9ea07b41da	[1/N] Reorganize multimodal processing code (#32327 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 15:25:31 +00:00
Ning Xie	552b262936	rename tokenize serving api request id prefix to tokenize (#32328 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-14 14:52:20 +00:00
Chauncey	00e6402d56	[Frontend] track responsesAPI server_load (#32323 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 12:00:37 +00:00
Shanshan Shen	ce0946249d	[Misc] Make mem utils can be reused by other platforms (#32322 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-14 03:46:01 -08:00
Cyrus Leung	3f28174c6a	[Frontend] Standardize use of `create_error_response` (#32319 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 11:22:26 +00:00
Chauncey	769d0629e1	[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 10:20:58 +00:00
Cyrus Leung	90db5b31e4	[Refactor] Move top-level dummy data generation to registry (#32310 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 02:17:46 -08:00
Roger Wang	b8199f6049	[Model] Re-implement Qwen3Omni Audio Encoder (#32167 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-01-14 15:40:30 +08:00
sangho.lee	7e6f123810	Add Molmo2 multimodal model support (#30997 ) Signed-off-by: sanghol <sanghol@allenai.org> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-14 15:33:09 +08:00
Chauncey	9312a6c03a	[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 07:26:24 +00:00
Michael Goin	6388b50058	[Docs] Add docs about OOT Quantization Plugins (#32035 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-14 15:25:45 +08:00
Hongxia Yang	048bb59728	AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-01-13 23:25:10 -08:00
Angela Yi	7933638051	[misc] Remove is_torch_equal_or_newer(2.4) cases (#32296 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-13 23:22:07 -08:00
David	6b176095e3	[Build] Relax anthropic version pin from ==0.71.0 to >=0.71.0 (#32289 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-13 23:21:39 -08:00
Andreas Karatzas	9d0d7f48d5	[ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-14 07:21:26 +00:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
Micah Williamson	6fa6e7ef0c	[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test (#32275 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-14 13:29:42 +08:00
Woosuk Kwon	90c0836902	[Model Runner V2] Refactor Sampler (#32245 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-13 17:58:12 -08:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00
emricksini-h	2a60ac91d0	[Improvement] Persist CUDA compat libraries paths to prevent reset on `apt-get` (#30784 ) Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>	2026-01-13 14:35:05 -08:00
Michael Goin	9e65bb4ef4	Add mergify label job for "bug" in PR titles (#31980 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-13 14:28:19 -08:00
Simon Mo	0db574b185	[Build] Add scripts for cherry-picking and trigger build (#32282 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-01-13 13:21:05 -08:00
HappyAmazonian	2f4a71daf2	[Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint (#28502 ) Signed-off-by: Shen Teng <sheteng@amazon.com> Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com>	2026-01-13 13:06:10 -08:00
Rabi Mishra	69f8a0ea37	fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-13 19:11:54 +00:00
Wentao Ye	f28125d87b	[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-13 10:58:18 -08:00
Dmitry Tokarev	46f8c6b725	Fix CUDA 13 wheel installation doc (#32276 ) Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>	2026-01-13 10:48:37 -08:00
Andrew Xia	af54d2e2d0	[responseAPI] support partial message generation (#32100 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <mitandrewxia@gmail.com> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-01-13 10:41:26 -08:00
Sage Moore	6beef12b9b	[EPLB][Cleanup] Remove `is_async_enabled` from `EplbModelState` (#32050 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-01-13 18:19:03 +00:00
Mark McLoughlin	ab74b2a27a	[Trivial] Remove duplicate enable_mfu_metrics (#32246 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-01-14 01:09:23 +08:00
Matthew Bonanni	2263d44b68	[4/N][Attention] Move MLA common to model_executor (#32060 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-13 09:08:45 -08:00
Mathis Felardos	4f3676e726	nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak (#32181 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2026-01-13 16:21:10 +00:00
Martin Hickey	510265472c	[BugFix] [KVConnector] Fix KV events for LMCache connector (#32169 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-13 15:50:34 +00:00
Chauncey	4f02cb2eac	[Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 15:37:34 +00:00
Cyrus Leung	252c011012	[Refactor] Remove `MultiModalProfiler` (#32254 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 15:10:20 +00:00
Matthew Bonanni	98f60e5acb	[6/N][Attention] Move utils to more appropriate locations (#32215 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-13 05:38:52 -08:00
Chauncey	fefce49807	[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 13:01:39 +00:00
Mickaël Seznec	a5bbbd2f24	[Quantization] fix: overflow with static per-tensor scaling (#29867 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-13 12:56:01 +00:00
Nicolò Lucchesi	8c8653b672	[Docs] Nixl Usage recommend `fail` kv_load_failure_policy (#32198 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-13 12:51:57 +00:00
Cyrus Leung	232214b2ae	[Bugfix] Replace `PoolingParams.normalize` with `use_activation` (#32243 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 10:45:42 +00:00
Cyrus Leung	eb28e8068d	[Refactor] Remove `get_encoder_dummy_data` (#32241 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 09:21:23 +00:00
YunzhuLu	542a4059b2	[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (#32126 ) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-13 09:04:29 +00:00
Andreas Karatzas	df7e12715f	[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-13 15:14:30 +08:00
Roy Wang	44c34f22d9	[Doc] Update installation from source command (#32239 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-12 23:10:27 -08:00
Xingyu Liu	80221e1884	[BugFix]Fix eagle draft_model_config and add tests (#31753 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>	2026-01-12 23:09:36 -08:00
Andreas Karatzas	5e714f7ff4	[ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-12 22:33:59 -08:00
Andreas Karatzas	11b6af5280	[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#32099 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> v0.14.0rc1	2026-01-13 05:46:53 +00:00

1 2 3 4 5 ...

12989 Commits