Fadi Arafeh
10e94c84f6
[CPU][Feat] Update PyTorch to v2.10 for CPU Backend ( #32869 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-01-23 21:13:06 +08:00
Isotr0py
243e78c20f
[Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark ( #32927 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-23 12:11:18 +00:00
Fadi Arafeh
aac0b817fa
[CPU Backend][BugFix] Fix failing CPU MoE test ( #32876 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-23 12:06:51 +00:00
wang.yuqi
05f3d714db
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest ( #32905 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
Patrick von Platen
3f3f89529d
[Voxtral] Add new streaming arch ( #32861 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:41:52 +01:00
Li, Jiang
5da4c7d789
[CI/Build][CPU] Fix failed pooling tests and macos smoke test ( #32907 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 10:48:20 +00:00
Nicolò Lucchesi
160c6fa387
[Misc] Add get_name to missing AttentionBackends ( #32698 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-23 10:35:44 +00:00
Andreas Karatzas
a8eb1182f1
[CI][Models] Add VLM Support for Sequence Classification Conversion ( #32885 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-23 16:22:51 +08:00
Karan Bansal
fa6e599a61
[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set ( #32777 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
2026-01-23 08:22:37 +00:00
Wentao Ye
7ef5873752
[CI] Fix mypy for vllm/v1/structured_output ( #32722 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 11:55:51 +08:00
Luka Govedič
5e4e0e51f4
[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops ( #32806 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-22 19:52:26 -08:00
Rishabh Saini
f61c9da711
[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions ( #32884 )
...
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com >
2026-01-23 03:44:11 +00:00
Nick Hill
7fe255889e
[Misc] Log vLLM logo when starting server ( #32796 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 11:15:12 +08:00
bnellnm
dc917cceb8
[MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE ( #31996 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-22 18:21:35 -05:00
Fadi Arafeh
fc56f4a071
[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration ( #32855 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 22:27:40 +00:00
Xin Yang
d08b356ee0
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper ( #32619 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-22 15:47:04 -05:00
Wentao Ye
f744810184
[Refactor] Remove unused tpu files ( #32610 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-22 15:35:18 -05:00
Eldar Kurtić
44f08af3a7
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) ( #30141 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com >
2026-01-22 13:29:57 -07:00
Matthew Bonanni
955b43a5a5
[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 ( #32795 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-22 19:05:18 +00:00
Fadi Arafeh
744ef30484
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm ( #32792 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 18:55:23 +00:00
Matthew Bonanni
300622e609
[CI][Attention] Add more CI dependencies for attention tests ( #32487 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-22 18:44:56 +00:00
RickyChen / 陳昭儒
69d09fdd6c
[Feature] Add --ssl-ciphers CLI argument for TLS cipher control ( #30937 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
2026-01-22 09:53:24 -08:00
David Ramon Prados
3a63be0faa
Support custom URI schemes and trace handlers for profiler ( #32393 )
2026-01-22 09:45:40 -08:00
Tyler Michael Smith
803e3f3f68
[UX] Default api_server_count to dp_size if not specified ( #32525 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-22 17:35:35 +00:00
Vadim Gimpelson
70917b1c55
[MISC] Add .cursor to .gitignore ( #32868 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-22 17:27:13 +00:00
Matt
c517d8c934
[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars ( #32837 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-23 00:59:15 +08:00
Xu Jinyang
fc37187a51
[Bugfix] ModelScope is supported when downloading LORA models. ( #32844 )
...
Signed-off-by: AuYang <459461160@qq.com >
2026-01-22 16:33:21 +00:00
Maximilien de Bayser
ff365eea94
Support bge-m3 sparse embeddings and colbert embeddings ( #14526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2026-01-22 23:52:57 +08:00
Isotr0py
444e2e7e1f
[Misc] Bump opencv-python dependecy version to 4.13 ( #32668 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-22 15:51:15 +00:00
Nick Hill
bc14663e6a
[Cleanup] Move scheduler get_routed_experts logic to separate method ( #32706 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-22 10:46:00 -05:00
Richard Zou
654a71fc3c
[torch.compile] Improve Cold Start for MoEs ( #32805 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-22 10:44:40 -05:00
Lucas Kabela
15e302dfce
[Misc][BE] Turn on strict type coverage for vllm/compilation ( #31756 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-22 15:12:26 +00:00
Cyrus Leung
d117a4d1a9
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) ( #30200 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 12:44:22 +00:00
Or Ozeri
421012b63a
OffloadingConnector: Support kernel_block_size != block_size ( #30692 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-22 12:30:04 +00:00
Chauncey
841d53aaa8
[Frontend] add prompt_cache_key for openresponses ( #32824 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-22 11:34:14 +00:00
Shengqi Chen
1752262e96
[CI] refactor release pipeline config into groups ( #32833 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-01-22 11:27:21 +00:00
Nicolò Lucchesi
ea6102b85d
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak ( #32789 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-22 10:50:37 +00:00
wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
liranschour
64e3d67ac0
Enable Cross layers KV cache layout at NIXL Connector ( #30207 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2026-01-22 10:12:58 +00:00
Nick Hill
098b2d66fe
[Benchmark] Don't default to temperature==0 in vllm bench serve ( #32723 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-22 10:03:15 +00:00
Isotr0py
8ebf271bb6
[Misc] Replace urllib's urlparse with urllib3's parse_url ( #32746 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-22 16:37:15 +08:00
Alex Sun
49a1262267
[AMD][ROCm] MoRI EP: a high-performance all2all backend ( #28664 )
...
Signed-off-by: Alex Sun <alex.s@amd.com >
2026-01-22 16:33:18 +08:00
Cyrus Leung
2b8a38b6d6
[Model] Extend collect_children and no_init_weights contexts ( #32757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 08:20:27 +00:00
Kebe
1bf1a34b19
[bench] add start_times field to vllm bench serve json result ( #32667 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2026-01-22 07:10:14 +00:00
Andreas Karatzas
a810299838
[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm ( #32835 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-21 22:11:09 -08:00
Andreas Karatzas
eb1629da24
[ROCm][CI] Fix AITER test flakiness by using explicit attention backend ( #32346 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-22 13:55:25 +08:00
Micah Williamson
019e2c3b7c
[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization ( #32731 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-22 05:47:33 +00:00
Huy Do
f5fdec8ce2
Upgrade transformers-4.57.5 ( #32287 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-01-22 05:19:19 +00:00
Patrick von Platen
1579c9b5fd
[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file ( #32780 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-01-22 05:14:57 +00:00
Lucas Wilkinson
889722f3bf
[FlashMLA] Update FlashMLA to expose new arguments ( #32810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-21 22:02:39 -07:00