biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ltd0924	709502558c	[Model] Add Step3vl 10b (#32329 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-15 19:04:16 -08:00
Micah Williamson	46f8a982b1	[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:55:57 +00:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Yongye Zhu	31c29257c8	[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-15 12:53:40 -08:00
Michael Goin	1be5a73571	[UX] Use kv_offloading_backend=native by default (#32421 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 18:55:11 +00:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Dipika Sikka	361dfdc9d8	[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-15 07:25:55 -08:00
Cyrus Leung	28459785ff	[3/N] Group together media-related code (#32406 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 11:52:12 +00:00
rasmith	8853a50af2	[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-15 19:05:54 +08:00
Chauncey	707b44cc28	[Refactor] [11/N] to simplify the mcp architecture (#32396 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 18:49:31 +08:00
Cyrus Leung	cbbae38f93	[2/N] Move cache factories to MM registry (#32382 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 01:02:30 -08:00
dtc	1e584823f8	[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-01-15 16:31:16 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
rasmith	3c2685645e	[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-15 05:04:34 +00:00
Micah Williamson	773d7073ae	[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-15 04:53:43 +00:00
Ryan Rock	15422ed3f7	[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-15 04:01:42 +00:00
dolpm	8471b27df9	[compile] raise on compile_size implicit padding (#32343 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-14 20:46:56 +00:00
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
Aleksandr Samarin	d084e9fca7	[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291 ) Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com> Signed-off-by: southfreebird <yvorott@gmail.com> Co-authored-by: southfreebird <yvorott@gmail.com>	2026-01-14 13:20:52 -05:00
Cyrus Leung	9ea07b41da	[1/N] Reorganize multimodal processing code (#32327 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 15:25:31 +00:00
Cyrus Leung	90db5b31e4	[Refactor] Move top-level dummy data generation to registry (#32310 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 02:17:46 -08:00
sangho.lee	7e6f123810	Add Molmo2 multimodal model support (#30997 ) Signed-off-by: sanghol <sanghol@allenai.org> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-14 15:33:09 +08:00
Chauncey	9312a6c03a	[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 07:26:24 +00:00
Hongxia Yang	048bb59728	AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-01-13 23:25:10 -08:00
Andreas Karatzas	9d0d7f48d5	[ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-14 07:21:26 +00:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00
Rabi Mishra	69f8a0ea37	fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-13 19:11:54 +00:00
Wentao Ye	f28125d87b	[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-13 10:58:18 -08:00
Andrew Xia	af54d2e2d0	[responseAPI] support partial message generation (#32100 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <mitandrewxia@gmail.com> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-01-13 10:41:26 -08:00
Matthew Bonanni	2263d44b68	[4/N][Attention] Move MLA common to model_executor (#32060 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-13 09:08:45 -08:00
Chauncey	4f02cb2eac	[Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 15:37:34 +00:00
Cyrus Leung	252c011012	[Refactor] Remove `MultiModalProfiler` (#32254 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 15:10:20 +00:00
Matthew Bonanni	98f60e5acb	[6/N][Attention] Move utils to more appropriate locations (#32215 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-13 05:38:52 -08:00
Chauncey	fefce49807	[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 13:01:39 +00:00
Cyrus Leung	232214b2ae	[Bugfix] Replace `PoolingParams.normalize` with `use_activation` (#32243 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 10:45:42 +00:00
Andreas Karatzas	df7e12715f	[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-13 15:14:30 +08:00
Xingyu Liu	80221e1884	[BugFix]Fix eagle draft_model_config and add tests (#31753 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>	2026-01-12 23:09:36 -08:00
Andreas Karatzas	5e714f7ff4	[ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-12 22:33:59 -08:00
Nick Hill	c6bb5b5603	[BugFix] Fix engine crash caused by chat tools + response_format (#32127 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-13 10:33:14 +08:00
Andrew Xia	a307ac0734	[responsesAPI] add unit test for optional function tool call id (#32036 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-01-12 16:14:54 -08:00
Nicolò Lucchesi	f8bd8394e3	[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 20:38:49 +00:00
Or Ozeri	2be765b68a	[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 18:39:38 +00:00
Ilya Markov	1eb61ab34b	[Refactor] EPLB rebalance algo to NumPy (#30697 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-01-12 18:13:23 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Cyrus Leung	7c0d3c5152	[Benchmark] Share data between SLA runs (#32184 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-13 01:12:22 +08:00
Nicolò Lucchesi	5b68107411	[Misc][PD] Fix `get_attn_backend` usage in transfer connectors (#31988 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 18:10:05 +01:00
Asaf Joseph Gardin	8fb2c135be	[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-01-12 17:02:38 +00:00
danielafrimi	3f72639d36	[FIX] Add NO_MUL activation support for modular kernel path (#31528 ) Signed-off-by: dafrimi <dafrimi@nvidia.com> Signed-off-by: <> Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local> Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: root <root@pool0-01777.cm.cluster>	2026-01-12 11:55:49 -05:00
Jaehyun An	6bc9c8473e	[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384 ) Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>	2026-01-12 16:39:02 +00:00

1 2 3 4 5 ...

4252 Commits