biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nicolò Lucchesi	160c6fa387	[Misc] Add `get_name` to missing AttentionBackends (#32698 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-23 10:35:44 +00:00
Andreas Karatzas	a8eb1182f1	[CI][Models] Add VLM Support for Sequence Classification Conversion (#32885 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-23 16:22:51 +08:00
Karan Bansal	fa6e599a61	[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-23 08:22:37 +00:00
Wentao Ye	7ef5873752	[CI] Fix mypy for `vllm/v1/structured_output` (#32722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 11:55:51 +08:00
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
Rishabh Saini	f61c9da711	[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions (#32884 ) Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>	2026-01-23 03:44:11 +00:00
Nick Hill	7fe255889e	[Misc] Log vLLM logo when starting server (#32796 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 11:15:12 +08:00
bnellnm	dc917cceb8	[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE` (#31996 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-22 18:21:35 -05:00
Fadi Arafeh	fc56f4a071	[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration (#32855 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-22 22:27:40 +00:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Wentao Ye	f744810184	[Refactor] Remove unused tpu files (#32610 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 15:35:18 -05:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Matthew Bonanni	955b43a5a5	[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-22 19:05:18 +00:00
Fadi Arafeh	744ef30484	[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-22 18:55:23 +00:00
Matthew Bonanni	300622e609	[CI][Attention] Add more CI dependencies for attention tests (#32487 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-22 18:44:56 +00:00
RickyChen / 陳昭儒	69d09fdd6c	[Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>	2026-01-22 09:53:24 -08:00
David Ramon Prados	3a63be0faa	Support custom URI schemes and trace handlers for profiler (#32393 )	2026-01-22 09:45:40 -08:00
Tyler Michael Smith	803e3f3f68	[UX] Default api_server_count to dp_size if not specified (#32525 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-01-22 17:35:35 +00:00
Vadim Gimpelson	70917b1c55	[MISC] Add .cursor to .gitignore (#32868 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-22 17:27:13 +00:00
Matt	c517d8c934	[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 00:59:15 +08:00
Xu Jinyang	fc37187a51	[Bugfix] ModelScope is supported when downloading LORA models. (#32844 ) Signed-off-by: AuYang <459461160@qq.com>	2026-01-22 16:33:21 +00:00
Maximilien de Bayser	ff365eea94	Support bge-m3 sparse embeddings and colbert embeddings (#14526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com>	2026-01-22 23:52:57 +08:00
Isotr0py	444e2e7e1f	[Misc] Bump opencv-python dependecy version to 4.13 (#32668 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 15:51:15 +00:00
Nick Hill	bc14663e6a	[Cleanup] Move scheduler `get_routed_experts` logic to separate method (#32706 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-22 10:46:00 -05:00
Richard Zou	654a71fc3c	[torch.compile] Improve Cold Start for MoEs (#32805 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-22 10:44:40 -05:00
Lucas Kabela	15e302dfce	[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-22 15:12:26 +00:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Chauncey	841d53aaa8	[Frontend] add prompt_cache_key for openresponses (#32824 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-22 11:34:14 +00:00
Shengqi Chen	1752262e96	[CI] refactor release pipeline config into groups (#32833 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2026-01-22 11:27:21 +00:00
Nicolò Lucchesi	ea6102b85d	[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-22 10:50:37 +00:00
wang.yuqi	328cbb2773	[Frontend][2/n] Make pooling entrypoints request schema consensus \| ChatRequest (#32574 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-22 10:32:44 +00:00
liranschour	64e3d67ac0	Enable Cross layers KV cache layout at NIXL Connector (#30207 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-01-22 10:12:58 +00:00
Nick Hill	098b2d66fe	[Benchmark] Don't default to `temperature==0` in `vllm bench serve` (#32723 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-22 10:03:15 +00:00
Isotr0py	8ebf271bb6	[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 16:37:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Cyrus Leung	2b8a38b6d6	[Model] Extend `collect_children` and `no_init_weights` contexts (#32757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 08:20:27 +00:00
Kebe	1bf1a34b19	[bench] add start_times field to vllm bench serve json result (#32667 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2026-01-22 07:10:14 +00:00
Andreas Karatzas	a810299838	[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-21 22:11:09 -08:00
Andreas Karatzas	eb1629da24	[ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-22 13:55:25 +08:00
Micah Williamson	019e2c3b7c	[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-22 05:47:33 +00:00
Huy Do	f5fdec8ce2	Upgrade transformers-4.57.5 (#32287 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-01-22 05:19:19 +00:00
Patrick von Platen	1579c9b5fd	[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-01-22 05:14:57 +00:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Divakar Verma	49d9653852	[ROCm][CI] fix get_valid_backends (#32787 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-22 04:27:47 +00:00
Ifta khairul Alam Adil	a1d82466ea	[Docs] Remove outdated async_scheduling limitation with speculative decoding (#32775 ) Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>	2026-01-21 20:19:25 -08:00
Lucain	24a163ed77	Cleanup some huggingface_hub-related stuff (#32788 )	2026-01-22 03:38:17 +00:00
knlnguyen1802	378385b90c	[EC Connector] Optimize remote cache check in scheduler (#32585 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2026-01-22 03:30:59 +00:00
Matt	c5487e2b96	[Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-22 03:11:55 +00:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00

1 2 3 4 5 ...

13208 Commits