Nicolò Lucchesi
|
160c6fa387
|
[Misc] Add get_name to missing AttentionBackends (#32698)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-23 10:35:44 +00:00 |
|
Andreas Karatzas
|
a8eb1182f1
|
[CI][Models] Add VLM Support for Sequence Classification Conversion (#32885)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-23 16:22:51 +08:00 |
|
Karan Bansal
|
fa6e599a61
|
[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777)
Signed-off-by: Karan Bansal <karanb192@gmail.com>
|
2026-01-23 08:22:37 +00:00 |
|
Wentao Ye
|
7ef5873752
|
[CI] Fix mypy for vllm/v1/structured_output (#32722)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 11:55:51 +08:00 |
|
Luka Govedič
|
5e4e0e51f4
|
[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-22 19:52:26 -08:00 |
|
Rishabh Saini
|
f61c9da711
|
[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions (#32884)
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
|
2026-01-23 03:44:11 +00:00 |
|
Nick Hill
|
7fe255889e
|
[Misc] Log vLLM logo when starting server (#32796)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-23 11:15:12 +08:00 |
|
bnellnm
|
dc917cceb8
|
[MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE (#31996)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-22 18:21:35 -05:00 |
|
Fadi Arafeh
|
fc56f4a071
|
[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration (#32855)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-22 22:27:40 +00:00 |
|
Xin Yang
|
d08b356ee0
|
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-22 15:47:04 -05:00 |
|
Wentao Ye
|
f744810184
|
[Refactor] Remove unused tpu files (#32610)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 15:35:18 -05:00 |
|
Eldar Kurtić
|
44f08af3a7
|
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-22 13:29:57 -07:00 |
|
Matthew Bonanni
|
955b43a5a5
|
[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-22 19:05:18 +00:00 |
|
Fadi Arafeh
|
744ef30484
|
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-22 18:55:23 +00:00 |
|
Matthew Bonanni
|
300622e609
|
[CI][Attention] Add more CI dependencies for attention tests (#32487)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-22 18:44:56 +00:00 |
|
RickyChen / 陳昭儒
|
69d09fdd6c
|
[Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
|
2026-01-22 09:53:24 -08:00 |
|
David Ramon Prados
|
3a63be0faa
|
Support custom URI schemes and trace handlers for profiler (#32393)
|
2026-01-22 09:45:40 -08:00 |
|
Tyler Michael Smith
|
803e3f3f68
|
[UX] Default api_server_count to dp_size if not specified (#32525)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-01-22 17:35:35 +00:00 |
|
Vadim Gimpelson
|
70917b1c55
|
[MISC] Add .cursor to .gitignore (#32868)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-22 17:27:13 +00:00 |
|
Matt
|
c517d8c934
|
[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-23 00:59:15 +08:00 |
|
Xu Jinyang
|
fc37187a51
|
[Bugfix] ModelScope is supported when downloading LORA models. (#32844)
Signed-off-by: AuYang <459461160@qq.com>
|
2026-01-22 16:33:21 +00:00 |
|
Maximilien de Bayser
|
ff365eea94
|
Support bge-m3 sparse embeddings and colbert embeddings (#14526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
|
2026-01-22 23:52:57 +08:00 |
|
Isotr0py
|
444e2e7e1f
|
[Misc] Bump opencv-python dependecy version to 4.13 (#32668)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-22 15:51:15 +00:00 |
|
Nick Hill
|
bc14663e6a
|
[Cleanup] Move scheduler get_routed_experts logic to separate method (#32706)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-22 10:46:00 -05:00 |
|
Richard Zou
|
654a71fc3c
|
[torch.compile] Improve Cold Start for MoEs (#32805)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-22 10:44:40 -05:00 |
|
Lucas Kabela
|
15e302dfce
|
[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-22 15:12:26 +00:00 |
|
Cyrus Leung
|
d117a4d1a9
|
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 12:44:22 +00:00 |
|
Or Ozeri
|
421012b63a
|
OffloadingConnector: Support kernel_block_size != block_size (#30692)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-22 12:30:04 +00:00 |
|
Chauncey
|
841d53aaa8
|
[Frontend] add prompt_cache_key for openresponses (#32824)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-22 11:34:14 +00:00 |
|
Shengqi Chen
|
1752262e96
|
[CI] refactor release pipeline config into groups (#32833)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-01-22 11:27:21 +00:00 |
|
Nicolò Lucchesi
|
ea6102b85d
|
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-22 10:50:37 +00:00 |
|
wang.yuqi
|
328cbb2773
|
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-22 10:32:44 +00:00 |
|
liranschour
|
64e3d67ac0
|
Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2026-01-22 10:12:58 +00:00 |
|
Nick Hill
|
098b2d66fe
|
[Benchmark] Don't default to temperature==0 in vllm bench serve (#32723)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-22 10:03:15 +00:00 |
|
Isotr0py
|
8ebf271bb6
|
[Misc] Replace urllib's urlparse with urllib3's parse_url (#32746)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-22 16:37:15 +08:00 |
|
Alex Sun
|
49a1262267
|
[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
|
2026-01-22 16:33:18 +08:00 |
|
Cyrus Leung
|
2b8a38b6d6
|
[Model] Extend collect_children and no_init_weights contexts (#32757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 08:20:27 +00:00 |
|
Kebe
|
1bf1a34b19
|
[bench] add start_times field to vllm bench serve json result (#32667)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2026-01-22 07:10:14 +00:00 |
|
Andreas Karatzas
|
a810299838
|
[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-21 22:11:09 -08:00 |
|
Andreas Karatzas
|
eb1629da24
|
[ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-22 13:55:25 +08:00 |
|
Micah Williamson
|
019e2c3b7c
|
[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-22 05:47:33 +00:00 |
|
Huy Do
|
f5fdec8ce2
|
Upgrade transformers-4.57.5 (#32287)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2026-01-22 05:19:19 +00:00 |
|
Patrick von Platen
|
1579c9b5fd
|
[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2026-01-22 05:14:57 +00:00 |
|
Lucas Wilkinson
|
889722f3bf
|
[FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 22:02:39 -07:00 |
|
Divakar Verma
|
49d9653852
|
[ROCm][CI] fix get_valid_backends (#32787)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-22 04:27:47 +00:00 |
|
Ifta khairul Alam Adil
|
a1d82466ea
|
[Docs] Remove outdated async_scheduling limitation with speculative decoding (#32775)
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
|
2026-01-21 20:19:25 -08:00 |
|
Lucain
|
24a163ed77
|
Cleanup some huggingface_hub-related stuff (#32788)
|
2026-01-22 03:38:17 +00:00 |
|
knlnguyen1802
|
378385b90c
|
[EC Connector] Optimize remote cache check in scheduler (#32585)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
|
2026-01-22 03:30:59 +00:00 |
|
Matt
|
c5487e2b96
|
[Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-22 03:11:55 +00:00 |
|
Wentao Ye
|
6437ff1fb9
|
[Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 02:25:16 +00:00 |
|