biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Robert Shaw	c78ee240b3	Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725 )	2026-01-21 00:21:06 +00:00
Vasiliy Kuznetsov	d2389c1262	fp8 online quant: split out Fp8OnlineLinearMethod (#32189 )	2026-01-20 18:13:22 -05:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Rahul Tuli	f0feb1cf81	Test: added acceptance length tests (#32030 ) Signed-off-by: rahul-tuli <rtuli@redhat.com>	2026-01-20 18:55:15 +00:00
whx	4ca62a0dbd	[PluggableLayer][1/N] Define PluggableLayer (#32331 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-20 16:19:21 +00:00
linhaifeng	7901109ea5	[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-20 11:13:39 -05:00
杨朱 · Kiki	bb9172030e	[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661 ) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-20 12:28:41 +00:00
Walter Beller-Morales	8be263c3fb	[Core] Cleanup shm based object store on engine shutdown (#32429 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-01-20 08:53:37 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Jackmin801	12dab78f49	[Feat] allow inplace loading lora (#31326 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-20 10:15:20 +08:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
jiahanc	7350331718	[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-19 14:32:24 -05:00
Yanan Cao	9d1e611f0e	[CI] Add Helion as an optional dependency (#32482 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-19 19:09:56 +00:00
Vadim Gimpelson	0727cc9ecf	[BUGFIX] Fix `test_mla_backends.py`. Scale MLA projection weights to prevent numerical instability (#32529 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-19 13:49:29 -05:00
danisereb	aa7f37ccfa	Add support for LoRA adapters in Nemotron-H models (#30802 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-19 22:30:44 +08:00
wang.yuqi	c88860d759	[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-19 14:07:46 +00:00
Nicolò Lucchesi	74c583bc50	[Core] Whisper support `torch.compile` (#30385 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-19 10:02:31 +00:00
Andreas Karatzas	c0a350ca73	[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-19 09:57:54 +00:00
Yuxuan Zhang	71832ba71e	[GLM-4.7] GLM Model support for GLM-Lite (#31386 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Yuxuan Zhang <2448370773@qq.com>	2026-01-19 01:18:38 -08:00
Matt	11bbf86f6a	[CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused (#32408 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-19 08:25:47 +00:00
Hyunkyun Moon	3c8740aacb	[Frontend] Add render endpoints for prompt preprocessing (#32473 ) Signed-off-by: HyunKyun Moon <mhg5303@gmail.com> Signed-off-by: Hyunkyun Moon <mhg5303@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-19 12:21:46 +08:00
Alex Brooks	7518a3dc65	[CI/Build] Use Common Event Map Fixture in Harmony / MCP Server Tests (#32531 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-19 04:05:51 +00:00
bnellnm	327a02d8db	[MoE Refactor] Separate Router into OO Classes (#30623 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-18 11:40:49 -05:00
Li Xie	c826c72a96	[Model] Support Step1 Model (#32511 ) Signed-off-by: xieli <xieli@stepfun.com>	2026-01-18 10:20:46 +00:00
Isotr0py	8cc26acd8b	[Performance] Improve Triton prefill attention kernel's performance (#32403 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-17 20:19:59 -08:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
vanshil shah	037a6487af	apply _validate_input to MistralTokenizer token-id chat prompts (#32448 ) Signed-off-by: Vanshil Shah <vanshilshah@gmail.com>	2026-01-17 03:23:45 +00:00
Hashem Hashemi	7a1030431a	Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-01-16 11:45:04 -06:00
rasmith	6ca4f400d8	[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-16 16:22:53 +08:00
Micah Williamson	b84c426a8c	[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms (#32460 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:17:44 -08:00
Lucas Wilkinson	14ce524249	[CI] Breakup h200 tests (#30499 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 06:23:22 +00:00
wang.yuqi	4ae77dfd42	[Frontend][1/n] Make pooling entrypoints request schema consensus \| CompletionRequest (#32395 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-16 06:17:04 +00:00
cjackal	35bf5d08e8	[bugfix] Fix online serving crash when text type response_format is received (#26822 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>	2026-01-16 12:23:54 +08:00
ltd0924	709502558c	[Model] Add Step3vl 10b (#32329 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-15 19:04:16 -08:00
Micah Williamson	46f8a982b1	[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:55:57 +00:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Yongye Zhu	31c29257c8	[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-15 12:53:40 -08:00
Michael Goin	1be5a73571	[UX] Use kv_offloading_backend=native by default (#32421 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 18:55:11 +00:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Dipika Sikka	361dfdc9d8	[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-15 07:25:55 -08:00
Cyrus Leung	28459785ff	[3/N] Group together media-related code (#32406 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 11:52:12 +00:00
rasmith	8853a50af2	[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-15 19:05:54 +08:00
Chauncey	707b44cc28	[Refactor] [11/N] to simplify the mcp architecture (#32396 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 18:49:31 +08:00
Cyrus Leung	cbbae38f93	[2/N] Move cache factories to MM registry (#32382 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 01:02:30 -08:00
dtc	1e584823f8	[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-01-15 16:31:16 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
rasmith	3c2685645e	[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-15 05:04:34 +00:00
Micah Williamson	773d7073ae	[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-15 04:53:43 +00:00
Ryan Rock	15422ed3f7	[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-15 04:01:42 +00:00

... 2 3 4 5 6 ...

4336 Commits