biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	37a83007fe	[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 19:54:59 +08:00
Wentao Ye	bf5eec638d	[Refactor] Remove unused utils (#38153 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-26 17:08:19 +08:00
Mateusz Sokół	b1cb1d3d2c	DOC: Documentation pages fixes (#38125 ) Signed-off-by: Mateusz Sokół <mat646@gmail.com>	2026-03-26 16:55:42 +08:00
Kunshang Ji	6ae8bbd0c2	[XPU] Disable xpu graph by default (#38193 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-26 01:53:45 -07:00
Cyrus Leung	a9213c0ffe	[Doc] Fix outdated reference to CUDAGraphManager (#38209 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 01:52:38 -07:00
Cyrus Leung	502c41a8f6	[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 16:44:04 +08:00
Vadim Gimpelson	52069012fe	[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-26 01:21:47 -07:00
Fadi Arafeh	71161e8b63	[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-26 07:03:31 +00:00
Terry Gao	38de822310	[Model] Add torch.compile support for InternVL vision encoder (#38049 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-25 23:52:29 -07:00
Jee Jee Li	2bfbdca23c	[Bugfix] Fix benchmark_fused_collective.py (#38082 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-25 23:51:00 -07:00
Matej Rojec	2908094567	Add `/v1/chat/completions/batch` endpoint for batched chat completions (#38011 ) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>	2026-03-26 12:13:33 +08:00
BadrBasowid	e6bf9f15ec	[Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092 ) Signed-off-by: BadrBasowid <Badr.Basowid@gmail.com> Signed-off-by: BadrBasowid <61441185+BadrBasowid@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-25 21:11:43 -07:00
Woosuk Kwon	144030c84e	Relocate Encoder CUDA graph manager (#38116 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 20:52:12 -07:00
Flora Feng	e2db2b4234	[Tool Parser][1/3] Pass tools to ToolParser constructor (#38029 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-26 10:29:06 +08:00
Chauncey	87f05d6880	[Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder (#38076 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-26 01:43:51 +00:00
Andreas Karatzas	36f6aede23	[Misc] Optimized check to encapsulate both CUDA and ROCm platforms (#34549 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 09:43:07 +08:00
Xin Yang	9704a5c310	Disable dual stream execution of input projection for Qwen3 (#38152 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-26 01:20:39 +00:00
Wei Zhao	74056039b7	Fix minimax m2.5 nvfp4 kv scales weight loading (#37214 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-26 00:48:06 +00:00
Jacob Platin	d7d51a7ee5	[Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348 ) Signed-off-by: Jacob Platin <jacobplatin@google.com>	2026-03-26 00:46:01 +00:00
Harry Mellor	3c3c084240	Various Transformers v5 fixes (#38127 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 00:10:08 +00:00
Ekagra Ranjan	7b54f60db0	[Cohere] Enable Cohere-Transcribe (#38120 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-25 16:13:51 -07:00
Rohan Potdar	a0e8c74005	[ROCm]: Update rope+kvcache fusion conditions and disable custom op by default (#36716 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-25 20:58:44 +00:00
Guillaume Guy	70a2152830	[MultiModal] add support for numpy array embeddings (#38119 ) Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com> Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com> Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-25 20:13:04 +00:00
Sathish Sanjeevi	978fc18bf0	[ROCm] Utilize persistent MLA kernel from AITER (#36574 ) Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com>	2026-03-26 03:00:42 +08:00
Andreas Karatzas	7d6917bef5	[ROCm] Fix MoE kernel test failures on gfx950 (#37833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-03-25 13:46:40 -05:00
Mark McLoughlin	e38817fadb	[Core][KV Connector] Remove use of num_cached_tokens in error handling (#38096 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-25 18:20:48 +00:00
Nick Hill	72cad44d3c	[Frontend] Move APIServerProcessManager target server fn (#38115 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 18:14:41 +00:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Yongye Zhu	678b3c99e8	[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050 )	2026-03-25 10:16:40 -07:00
mikaylagawarecki	bf4cc9ed2d	[2/n] Migrate per_token_group_quant to torch stable ABI (#36058 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-25 10:15:13 -07:00
Ben Browning	1ac2ef2e53	[CI/Docs] Improve aarch64/DGX Spark support for dev setup (#38057 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 09:24:42 -07:00
Richard Zou	6e37c46b35	[compile] Add some more startup tests for top models (#38046 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-25 12:02:22 -04:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
Necofish	e7221180e1	[Kernel] Optimize SM120 CUTLASS blockwise FP8 GEMM (#37970 ) Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-25 08:20:04 -07:00
RobTand	4a76ad12e0	[Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell (#37725 ) Signed-off-by: Rob Tand <robert.tand@icloud.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-03-25 08:18:25 -07:00
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Andrii Skliar	cd7643015e	[Feature] Support per-draft-model MoE backend via `--speculative-config` (#37880 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: [Andrii Skliar] <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-25 14:31:52 +00:00
Ben Browning	a1a2566447	[Docs] Add guide for editing agent instruction files (#37819 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2026-03-25 13:54:09 +00:00
yjz	b745e8b5d3	[KVTransfer][Mooncake] Add heterogeneous TP support for disaggregated P/D in MooncakeConnector (#36869 ) Signed-off-by: JianDan0212 <zhangyj0212@gmail.com>	2026-03-25 14:24:07 +01:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Fadi Arafeh	34d317dcec	[CPU][UX][Perf] Enable tcmalloc by default (#37607 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-25 20:39:57 +08:00
grYe99	7ac48fd357	[Model] Add AutoWeightsLoader support for jais (#38074 ) Signed-off-by: grYe99 <guorongye99@gmail.com> Co-authored-by: grYe99 <guorongye99@gmail.com>	2026-03-25 12:38:40 +00:00
Harry Mellor	d6bb2a9d9a	Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:29:49 +00:00
Harry Mellor	1e673a43ce	Better weight tying check for multimodal models (#38035 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:07:23 +00:00
Andreas Karatzas	04417ecd5f	[ROCm][CI] Rename filepath test to point to correct file (#38102 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 20:05:46 +08:00
R0CKSTAR	242c93f744	[Docs] Adds vllm-musa to custom_op.md (#37840 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-25 11:54:36 +00:00
Matthias Gehre	a889b7f584	[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-25 11:42:58 +00:00
Harry Mellor	ba2910f73a	Fix offline mode test for Transformers v5 (#38095 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 11:39:48 +00:00
Andreas Karatzas	f262a62aa1	[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 10:55:51 +00:00
Andreas Karatzas	9ac2fcafbb	[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 11:24:33 +01:00

1 2 3 4 5 ...

15269 Commits