biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
bnellnm	dc917cceb8	[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE` (#31996 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-22 18:21:35 -05:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
David Ramon Prados	3a63be0faa	Support custom URI schemes and trace handlers for profiler (#32393 )	2026-01-22 09:45:40 -08:00
Matt	c517d8c934	[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 00:59:15 +08:00
Maximilien de Bayser	ff365eea94	Support bge-m3 sparse embeddings and colbert embeddings (#14526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com>	2026-01-22 23:52:57 +08:00
Isotr0py	444e2e7e1f	[Misc] Bump opencv-python dependecy version to 4.13 (#32668 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 15:51:15 +00:00
Richard Zou	654a71fc3c	[torch.compile] Improve Cold Start for MoEs (#32805 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-22 10:44:40 -05:00
Lucas Kabela	15e302dfce	[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-22 15:12:26 +00:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Nicolò Lucchesi	ea6102b85d	[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-22 10:50:37 +00:00
wang.yuqi	328cbb2773	[Frontend][2/n] Make pooling entrypoints request schema consensus \| ChatRequest (#32574 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-22 10:32:44 +00:00
liranschour	64e3d67ac0	Enable Cross layers KV cache layout at NIXL Connector (#30207 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-01-22 10:12:58 +00:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Cyrus Leung	2b8a38b6d6	[Model] Extend `collect_children` and `no_init_weights` contexts (#32757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 08:20:27 +00:00
Andreas Karatzas	a810299838	[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-21 22:11:09 -08:00
Andreas Karatzas	eb1629da24	[ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-22 13:55:25 +08:00
Micah Williamson	019e2c3b7c	[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-22 05:47:33 +00:00
Huy Do	f5fdec8ce2	Upgrade transformers-4.57.5 (#32287 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-01-22 05:19:19 +00:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Divakar Verma	49d9653852	[ROCm][CI] fix get_valid_backends (#32787 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-22 04:27:47 +00:00
knlnguyen1802	378385b90c	[EC Connector] Optimize remote cache check in scheduler (#32585 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2026-01-22 03:30:59 +00:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
elvischenv	808d6fd7b9	Bump Flashinfer to v0.6.1 (#30993 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-01-21 08:49:50 -08:00
whx	1861ae8aae	[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-21 11:38:04 -05:00
Robert Shaw	4e31b7f228	[Quantization][Deprecation] Remove RTN (#32697 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 16:34:42 +00:00
Robert Shaw	85f55c943c	[Quantization][Deprecation] Deprecate HQQ (#32681 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 09:32:40 -05:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Kim Hee Su	7727ce35c2	[Model] Add Eagle2.5-8B Vision-Language Model support (#32456 ) Signed-off-by: kimheesu <wlskaka4@gmail.com>	2026-01-21 09:39:53 +00:00
Lucas Wilkinson	b4f64e5b02	Update FlashMLA (#32491 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 13:03:37 +08:00
Nick Hill	6f067b1fb7	[Cleanup] Remove unused `KVConnectorModelRunnerMixin` methods (#32077 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 11:16:37 +08:00
Alex Brooks	27b81e010d	[Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default (#32299 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-21 11:11:52 +08:00
Or Ozeri	7013e9ac8f	OffloadingConnector: Prevent redundant loads (#29087 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-21 01:15:42 +00:00
Robert Shaw	c78ee240b3	Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725 )	2026-01-21 00:21:06 +00:00
Vasiliy Kuznetsov	d2389c1262	fp8 online quant: split out Fp8OnlineLinearMethod (#32189 )	2026-01-20 18:13:22 -05:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Rahul Tuli	f0feb1cf81	Test: added acceptance length tests (#32030 ) Signed-off-by: rahul-tuli <rtuli@redhat.com>	2026-01-20 18:55:15 +00:00
whx	4ca62a0dbd	[PluggableLayer][1/N] Define PluggableLayer (#32331 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-20 16:19:21 +00:00
linhaifeng	7901109ea5	[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-20 11:13:39 -05:00
杨朱 · Kiki	bb9172030e	[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661 ) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-20 12:28:41 +00:00
Walter Beller-Morales	8be263c3fb	[Core] Cleanup shm based object store on engine shutdown (#32429 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-01-20 08:53:37 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Jackmin801	12dab78f49	[Feat] allow inplace loading lora (#31326 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-20 10:15:20 +08:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
jiahanc	7350331718	[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-19 14:32:24 -05:00
Yanan Cao	9d1e611f0e	[CI] Add Helion as an optional dependency (#32482 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-19 19:09:56 +00:00

1 2 3 4 5 ...

4222 Commits