biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Netanel Haber	27ca95b3c9	[Bugfix] Fix Nemotron-Nano-v2-vlm static resolution (#32682 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-01-21 06:28:21 +00:00
Lucas Wilkinson	b4f64e5b02	Update FlashMLA (#32491 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 13:03:37 +08:00
shanjiaz	7ab80a8e37	Added qwen3 vision language moe support for speculative decoding (#32048 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com>	2026-01-21 03:24:05 +00:00
gopalsarda	0900cedb3f	Enable Eagle3 speculative decoding for Pixtral (LlavaForConditionalGeneration) (#32542 ) Signed-off-by: gopalsarda <gopal.sarda@servicenow.com>	2026-01-21 11:18:05 +08:00
Nick Hill	6f067b1fb7	[Cleanup] Remove unused `KVConnectorModelRunnerMixin` methods (#32077 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 11:16:37 +08:00
Alex Brooks	27b81e010d	[Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default (#32299 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-21 11:11:52 +08:00
Or Ozeri	7013e9ac8f	OffloadingConnector: Prevent redundant loads (#29087 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-21 01:15:42 +00:00
Robert Shaw	c78ee240b3	Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725 )	2026-01-21 00:21:06 +00:00
Vasiliy Kuznetsov	d2389c1262	fp8 online quant: split out Fp8OnlineLinearMethod (#32189 )	2026-01-20 18:13:22 -05:00
Micah Williamson	22375f8d13	[ROCm][CI] Remove DS async eplb accuracy test from AMD CI (#32717 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-20 13:40:48 -08:00
TJian	9b67338b78	[Bugfix] Suppress log on non-ROCm platform (#32703 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-20 13:38:20 -08:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
Shinichi Hemmi	86c69dc54c	[Bugfix] Fix byte fallback handling when using outlines (#31391 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Co-authored-by: Kenichi Maehashi <maehashi@preferred.jp>	2026-01-20 19:48:08 +00:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Cyrus Leung	193069d129	[5/N] Initialize MM components in context managers (Q-Z) (#32695 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 19:10:23 +00:00
Rahul Tuli	f0feb1cf81	Test: added acceptance length tests (#32030 ) Signed-off-by: rahul-tuli <rtuli@redhat.com>	2026-01-20 18:55:15 +00:00
Cyrus Leung	09194b90a5	[Doc] Update docs for MM model development with context usage (#32691 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 10:37:35 -08:00
Woosuk Kwon	9ab4388cd3	[Model Runner V2] Support FLASHINFER_MLA backend (#32709 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-20 10:26:17 -08:00
JJJYmmm	04a9e064db	[Bugfix] fix the ima issue of qwen-vit (#32687 ) Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>	2026-01-20 17:21:25 +00:00
TJian	c025263ddd	[Doc] [ROCm] Update ROCm getting started doc (#32580 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 09:20:08 -08:00
Wentao Ye	6c97b9b9b6	[Perf] Only clone when needed for `moe_permute` (#32273 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-20 11:34:39 -05:00
whx	4ca62a0dbd	[PluggableLayer][1/N] Define PluggableLayer (#32331 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-20 16:19:21 +00:00
linhaifeng	7901109ea5	[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-20 11:13:39 -05:00
YiSheng5	13f6630a9e	[XPU]Support AgRsAll2AllManager on XPU device (#32654 ) Signed-off-by: yisheng <yi.sheng@intel.com>	2026-01-20 14:27:24 +00:00
Cyrus Leung	fda3f03eb2	[4/N] Initialize MM components in context managers (M-P) (#32663 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 14:06:32 +00:00
杨朱 · Kiki	bb9172030e	[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661 ) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-20 12:28:41 +00:00
Chauncey	c4e5bdf61b	[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-20 18:48:07 +08:00
Cyrus Leung	7f1bcd18ff	[3/N] Initialize MM components in context managers (I-L) (#32650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 10:21:56 +00:00
Walter Beller-Morales	8be263c3fb	[Core] Cleanup shm based object store on engine shutdown (#32429 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-01-20 08:53:37 +00:00
Cyrus Leung	e1a34c3a5d	[2/N] Initialize MM components in context managers (E-H) (#32641 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 08:12:56 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Woosuk Kwon	e9c83cdc51	[Model Runner V2] Skip kernel launch for penalties & logit_bias (#32634 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 22:20:19 -08:00
Cyrus Leung	b75e85dede	[1/N] Initialize MM components in context managers (A-D) (#32632 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 14:12:42 +08:00
Cyrus Leung	4753f3bf69	[Model] Use context managers for encoder- and LM-only mode (#32605 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 11:43:38 +08:00
Woosuk Kwon	6c01ffb897	[Model Runner V2] Decouple temperature from penalties (#32629 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 19:13:24 -08:00
Woosuk Kwon	7b7cdce968	[Model Runner V2] Refactor get_cudagraph_and_dp_padding (#32625 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 18:25:02 -08:00
Jackmin801	12dab78f49	[Feat] allow inplace loading lora (#31326 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-20 10:15:20 +08:00
Woosuk Kwon	05dc4bfab6	[Model Runner V2] Initialized communication buffer for DP (#32624 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 17:27:06 -08:00
Matthew Bonanni	1a1fc3bbc0	[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-19 18:41:34 -05:00
Woosuk Kwon	43fada5360	[Model Runner V2] Refactor `dummy_run` (#32533 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 14:50:59 -08:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
lon	73f2a81c75	docs: prefix caching seems quite outdated (#28784 ) Signed-off-by: lon <114724657+longregen@users.noreply.github.com> Signed-off-by: Russell Bryant <russell.bryant@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <russell.bryant@gmail.com>	2026-01-19 11:49:52 -08:00
jiahanc	7350331718	[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-19 14:32:24 -05:00
Yanan Cao	9d1e611f0e	[CI] Add Helion as an optional dependency (#32482 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-19 19:09:56 +00:00
Vadim Gimpelson	0727cc9ecf	[BUGFIX] Fix `test_mla_backends.py`. Scale MLA projection weights to prevent numerical instability (#32529 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-19 13:49:29 -05:00
qli88	a0490be8f1	[CI][amd] Revert NIXL connector change to avoid crash (#32570 ) Signed-off-by: Qiang Li <qiang.li2@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-19 18:39:16 +00:00
Netanel Haber	cd3ac5b797	support dynamic resolution image encoding for Nemotron Nano VL (#32121 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-01-19 18:15:58 +00:00
Jee Jee Li	2636d76257	[Misc] Remove unused ModelKeys (#32608 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-19 17:34:59 +00:00
danisereb	aa7f37ccfa	Add support for LoRA adapters in Nemotron-H models (#30802 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-19 22:30:44 +08:00
wang.yuqi	c88860d759	[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-19 14:07:46 +00:00

... 24 25 26 27 28 ...

14386 Commits