Andreas Karatzas
|
37a83007fe
|
[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:54:59 +08:00 |
|
Wentao Ye
|
bf5eec638d
|
[Refactor] Remove unused utils (#38153)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 17:08:19 +08:00 |
|
Mateusz Sokół
|
b1cb1d3d2c
|
DOC: Documentation pages fixes (#38125)
Signed-off-by: Mateusz Sokół <mat646@gmail.com>
|
2026-03-26 16:55:42 +08:00 |
|
Kunshang Ji
|
6ae8bbd0c2
|
[XPU] Disable xpu graph by default (#38193)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-26 01:53:45 -07:00 |
|
Cyrus Leung
|
a9213c0ffe
|
[Doc] Fix outdated reference to CUDAGraphManager (#38209)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 01:52:38 -07:00 |
|
Cyrus Leung
|
502c41a8f6
|
[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 16:44:04 +08:00 |
|
Vadim Gimpelson
|
52069012fe
|
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-26 01:21:47 -07:00 |
|
Fadi Arafeh
|
71161e8b63
|
[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-26 07:03:31 +00:00 |
|
Terry Gao
|
38de822310
|
[Model] Add torch.compile support for InternVL vision encoder (#38049)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-25 23:52:29 -07:00 |
|
Jee Jee Li
|
2bfbdca23c
|
[Bugfix] Fix benchmark_fused_collective.py (#38082)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-25 23:51:00 -07:00 |
|
Matej Rojec
|
2908094567
|
Add /v1/chat/completions/batch endpoint for batched chat completions (#38011)
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
|
2026-03-26 12:13:33 +08:00 |
|
BadrBasowid
|
e6bf9f15ec
|
[Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092)
Signed-off-by: BadrBasowid <Badr.Basowid@gmail.com>
Signed-off-by: BadrBasowid <61441185+BadrBasowid@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-25 21:11:43 -07:00 |
|
Woosuk Kwon
|
144030c84e
|
Relocate Encoder CUDA graph manager (#38116)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 20:52:12 -07:00 |
|
Flora Feng
|
e2db2b4234
|
[Tool Parser][1/3] Pass tools to ToolParser constructor (#38029)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-26 10:29:06 +08:00 |
|
Chauncey
|
87f05d6880
|
[Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder (#38076)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-26 01:43:51 +00:00 |
|
Andreas Karatzas
|
36f6aede23
|
[Misc] Optimized check to encapsulate both CUDA and ROCm platforms (#34549)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 09:43:07 +08:00 |
|
Xin Yang
|
9704a5c310
|
Disable dual stream execution of input projection for Qwen3 (#38152)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-26 01:20:39 +00:00 |
|
Wei Zhao
|
74056039b7
|
Fix minimax m2.5 nvfp4 kv scales weight loading (#37214)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-26 00:48:06 +00:00 |
|
Jacob Platin
|
d7d51a7ee5
|
[Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348)
Signed-off-by: Jacob Platin <jacobplatin@google.com>
|
2026-03-26 00:46:01 +00:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|
Rohan Potdar
|
a0e8c74005
|
[ROCm]: Update rope+kvcache fusion conditions and disable custom op by default (#36716)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-25 20:58:44 +00:00 |
|
Guillaume Guy
|
70a2152830
|
[MultiModal] add support for numpy array embeddings (#38119)
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-25 20:13:04 +00:00 |
|
Sathish Sanjeevi
|
978fc18bf0
|
[ROCm] Utilize persistent MLA kernel from AITER (#36574)
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com>
|
2026-03-26 03:00:42 +08:00 |
|
Andreas Karatzas
|
7d6917bef5
|
[ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-03-25 13:46:40 -05:00 |
|
Mark McLoughlin
|
e38817fadb
|
[Core][KV Connector] Remove use of num_cached_tokens in error handling (#38096)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-25 18:20:48 +00:00 |
|
Nick Hill
|
72cad44d3c
|
[Frontend] Move APIServerProcessManager target server fn (#38115)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 18:14:41 +00:00 |
|
Cyrus Leung
|
ba2f0acc2d
|
[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-25 10:22:54 -07:00 |
|
Yongye Zhu
|
678b3c99e8
|
[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050)
|
2026-03-25 10:16:40 -07:00 |
|
mikaylagawarecki
|
bf4cc9ed2d
|
[2/n] Migrate per_token_group_quant to torch stable ABI (#36058)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
|
2026-03-25 10:15:13 -07:00 |
|
Ben Browning
|
1ac2ef2e53
|
[CI/Docs] Improve aarch64/DGX Spark support for dev setup (#38057)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 09:24:42 -07:00 |
|
Richard Zou
|
6e37c46b35
|
[compile] Add some more startup tests for top models (#38046)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-25 12:02:22 -04:00 |
|
Wentao Ye
|
1bf2ddd0ee
|
[Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR (#38048)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-25 11:41:44 -04:00 |
|
Necofish
|
e7221180e1
|
[Kernel] Optimize SM120 CUTLASS blockwise FP8 GEMM (#37970)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-25 08:20:04 -07:00 |
|
RobTand
|
4a76ad12e0
|
[Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell (#37725)
Signed-off-by: Rob Tand <robert.tand@icloud.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-03-25 08:18:25 -07:00 |
|
Wentao Ye
|
d7e93e13fb
|
[Feature] EPLB Support for GPU Model Runner v2 (#37488)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-25 08:16:39 -07:00 |
|
Andrii Skliar
|
cd7643015e
|
[Feature] Support per-draft-model MoE backend via --speculative-config (#37880)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: [Andrii Skliar] <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-03-25 14:31:52 +00:00 |
|
Ben Browning
|
a1a2566447
|
[Docs] Add guide for editing agent instruction files (#37819)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-25 13:54:09 +00:00 |
|
yjz
|
b745e8b5d3
|
[KVTransfer][Mooncake] Add heterogeneous TP support for disaggregated P/D in MooncakeConnector (#36869)
Signed-off-by: JianDan0212 <zhangyj0212@gmail.com>
|
2026-03-25 14:24:07 +01:00 |
|
Harry Mellor
|
d215d1efca
|
[Mypy] Better fixes for the mypy issues in vllm/config (#37902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 06:14:43 -07:00 |
|
Fadi Arafeh
|
34d317dcec
|
[CPU][UX][Perf] Enable tcmalloc by default (#37607)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-25 20:39:57 +08:00 |
|
grYe99
|
7ac48fd357
|
[Model] Add AutoWeightsLoader support for jais (#38074)
Signed-off-by: grYe99 <guorongye99@gmail.com>
Co-authored-by: grYe99 <guorongye99@gmail.com>
|
2026-03-25 12:38:40 +00:00 |
|
Harry Mellor
|
d6bb2a9d9a
|
Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 12:29:49 +00:00 |
|
Harry Mellor
|
1e673a43ce
|
Better weight tying check for multimodal models (#38035)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 12:07:23 +00:00 |
|
Andreas Karatzas
|
04417ecd5f
|
[ROCm][CI] Rename filepath test to point to correct file (#38102)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 20:05:46 +08:00 |
|
R0CKSTAR
|
242c93f744
|
[Docs] Adds vllm-musa to custom_op.md (#37840)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-25 11:54:36 +00:00 |
|
Matthias Gehre
|
a889b7f584
|
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-25 11:42:58 +00:00 |
|
Harry Mellor
|
ba2910f73a
|
Fix offline mode test for Transformers v5 (#38095)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 11:39:48 +00:00 |
|
Andreas Karatzas
|
f262a62aa1
|
[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 10:55:51 +00:00 |
|
Andreas Karatzas
|
9ac2fcafbb
|
[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 11:24:33 +01:00 |
|