biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Isotr0py	444e2e7e1f	[Misc] Bump opencv-python dependecy version to 4.13 (#32668 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 15:51:15 +00:00
Nick Hill	bc14663e6a	[Cleanup] Move scheduler `get_routed_experts` logic to separate method (#32706 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-22 10:46:00 -05:00
Richard Zou	654a71fc3c	[torch.compile] Improve Cold Start for MoEs (#32805 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-22 10:44:40 -05:00
Lucas Kabela	15e302dfce	[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-22 15:12:26 +00:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Chauncey	841d53aaa8	[Frontend] add prompt_cache_key for openresponses (#32824 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-22 11:34:14 +00:00
Shengqi Chen	1752262e96	[CI] refactor release pipeline config into groups (#32833 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2026-01-22 11:27:21 +00:00
Nicolò Lucchesi	ea6102b85d	[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-22 10:50:37 +00:00
wang.yuqi	328cbb2773	[Frontend][2/n] Make pooling entrypoints request schema consensus \| ChatRequest (#32574 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-22 10:32:44 +00:00
liranschour	64e3d67ac0	Enable Cross layers KV cache layout at NIXL Connector (#30207 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-01-22 10:12:58 +00:00
Nick Hill	098b2d66fe	[Benchmark] Don't default to `temperature==0` in `vllm bench serve` (#32723 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-22 10:03:15 +00:00
Isotr0py	8ebf271bb6	[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 16:37:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Cyrus Leung	2b8a38b6d6	[Model] Extend `collect_children` and `no_init_weights` contexts (#32757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 08:20:27 +00:00
Kebe	1bf1a34b19	[bench] add start_times field to vllm bench serve json result (#32667 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2026-01-22 07:10:14 +00:00
Andreas Karatzas	a810299838	[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-21 22:11:09 -08:00
Andreas Karatzas	eb1629da24	[ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-22 13:55:25 +08:00
Micah Williamson	019e2c3b7c	[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-22 05:47:33 +00:00
Huy Do	f5fdec8ce2	Upgrade transformers-4.57.5 (#32287 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-01-22 05:19:19 +00:00
Patrick von Platen	1579c9b5fd	[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-01-22 05:14:57 +00:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Divakar Verma	49d9653852	[ROCm][CI] fix get_valid_backends (#32787 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-22 04:27:47 +00:00
Ifta khairul Alam Adil	a1d82466ea	[Docs] Remove outdated async_scheduling limitation with speculative decoding (#32775 ) Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>	2026-01-21 20:19:25 -08:00
Lucain	24a163ed77	Cleanup some huggingface_hub-related stuff (#32788 )	2026-01-22 03:38:17 +00:00
knlnguyen1802	378385b90c	[EC Connector] Optimize remote cache check in scheduler (#32585 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2026-01-22 03:30:59 +00:00
Matt	c5487e2b96	[Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-22 03:11:55 +00:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00
Woosuk Kwon	5e00b561cd	[Model Runner V2] Do not error on attention backends (#32820 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-21 17:02:48 -08:00
Woosuk Kwon	408195ec59	[Model Runner V2] Refactor Prompt Logprobs (#32811 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-21 15:12:20 -08:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
Yanan Cao	e675dda67b	[Misc] Add Helion version check to collect_env (#32797 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-21 21:54:46 +00:00
Nick Hill	24dc30f7ff	[ModelRunner V2] Don't pin reused flashinfer tensors (#32799 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 13:17:43 -08:00
Divakar Verma	180fba653e	[ROCm] fix import for on_gfx9 (#32783 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-21 18:41:11 +00:00
danisereb	f999539869	Add missing import of fused_topk to benchmark_moe (#32784 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-21 18:30:10 +00:00
Woosuk Kwon	e1da249c93	[Model Runner V2] Minor refactor for `compute_slot_mappings` (#32794 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-21 10:24:35 -08:00
Nick Hill	9b693d023c	[Misc] Omit "disable NCCL for DP sync" startup log when not applicable (#32707 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 17:03:39 +00:00
elvischenv	808d6fd7b9	Bump Flashinfer to v0.6.1 (#30993 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-01-21 08:49:50 -08:00
whx	1861ae8aae	[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-21 11:38:04 -05:00
Robert Shaw	4e31b7f228	[Quantization][Deprecation] Remove RTN (#32697 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 16:34:42 +00:00
Pleaplusone	6c20e89c02	[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-21 23:16:30 +08:00
Robert Shaw	85f55c943c	[Quantization][Deprecation] Deprecate HQQ (#32681 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 09:32:40 -05:00
Robert Shaw	cea3c754c4	[Quantization][Deprecation] Remove `DeepSpeedFp8` (#32679 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 09:32:12 -05:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Divakar Verma	e14467be43	[bugfix] Aria model (#32727 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-21 05:11:31 -08:00
Kim Hee Su	7727ce35c2	[Model] Add Eagle2.5-8B Vision-Language Model support (#32456 ) Signed-off-by: kimheesu <wlskaka4@gmail.com>	2026-01-21 09:39:53 +00:00
Yanwen Lin	6bb2bc71e2	[Bugfix] Force using spawn multiprocess method when it's the WSL platform (#32749 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-01-21 09:35:55 +00:00
Lucas Kabela	c80f92c14d	[Documentation] Fix typo in `docs/design/torch_compile_multimodal.md` (#32741 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-20 23:54:20 -08:00
RickyChen / 陳昭儒	f23fb5a7c1	[Bugfix] Support HF sharded weights for Mistral3/Pixtral models (#32673 ) Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com> Signed-off-by: vllm-dev <ricky.chen@infinirc.com>	2026-01-20 23:27:30 -08:00
Paco Xu	360aa93f8f	[Docs] Fix GitHub handle in governance process (#32582 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-01-21 07:07:50 +00:00

... 23 24 25 26 27 ...

14386 Commits