rasmith
0024f39a32
[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality ( #34907 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-03-16 10:36:51 +08:00
Andrew Xia
e9163b536e
[responsesAPI][ez] add a unit test for SimpleContext logprobs ( #37126 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-15 17:12:26 -07:00
Lalithnarayan C
7acaea634c
In-Tree AMD Zen CPU Backend via zentorch [1/N] ( #35970 )
...
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-15 23:35:35 +00:00
Jiangyun Zhu
697e4ff352
[GDN] add a config for gdn kernel selection ( #36647 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-16 00:40:17 +08:00
Hari
a3e2e250f0
[Feature] Add Azure Blob Storage support for RunAI Model Streamer ( #34614 )
...
Signed-off-by: hasethuraman <hsethuraman@microsoft.com >
2026-03-15 19:38:21 +08:00
Isotr0py
143e4dccdf
[Misc] Add online audio_in_video test ( #36775 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-15 00:14:11 -07:00
Isotr0py
6590a3ecda
[Frontend] Remove torchcodec from audio dependency ( #37061 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-15 05:15:59 +00:00
Russell Bryant
b3debb7e77
[Build] Upgrade xgrammar to get a security fix ( #36168 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-15 03:13:48 +00:00
Nick Hill
458c1a4b2d
[Frontend] Reduce chat template warmup logging levels ( #37062 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-14 13:48:59 -07:00
Karan Bansal
821fde2df4
[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference ( #32384 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Co-authored-by: Inokinoki <inoki@inoki.cc >
2026-03-14 17:29:06 +00:00
arlo
8c29042bb9
[Feature] Add InstantTensor weight loader ( #36139 )
2026-03-14 18:05:23 +01:00
Cyrus Leung
5467d137b3
[Frontend] Avoid startup error log for models without chat template ( #37040 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-14 09:36:11 -07:00
Santino Ramos
3ed46f374b
[Model Runner V2] Add Support for XD-RoPE ( #36817 )
...
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com >
2026-03-14 09:27:55 -07:00
seanmamasde
84868e4793
[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats ( #35109 )
...
Signed-off-by: seanmamasde <seanmamasde@gmail.com >
2026-03-14 08:44:03 -07:00
Isotr0py
a8e8d62dd8
[Misc] Clean up Kimi-audio whisper encoder loading ( #36903 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-14 23:37:52 +08:00
Julien Denize
e42b49bd69
Mistral common v10 ( #36971 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: root <root@h200-bar-196-227.slurm-bar-compute.tenant-slurm.svc.cluster.local >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-14 07:26:43 -07:00
Sergey Zinchenko
4a718e770d
[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests ( https://github.com/vllm-project/vllm/issues/35665 ) ( #35684 )
2026-03-14 14:10:11 +00:00
Kevin H. Luu
600a039f57
[CI] Shard Multi-Modal Models (Standard) into 4 parallel jobs ( #37014 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-14 08:26:54 +00:00
Harry Mellor
ffa5d74f15
Enable loading of fused expert weights in the Transformers modelling backend ( #36997 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-14 07:01:06 +00:00
Kevin H. Luu
74fe80ee95
[CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs ( #37015 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-14 12:21:13 +08:00
Flora Feng
bcfdadb1bc
[Refactor] Relocate chat completion and anthropic tests ( #36919 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-14 12:16:16 +08:00
Yanan Cao
236de72e49
[CI] Pin helion version ( #37012 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 23:25:29 -04:00
sbeurnier
a116f96930
[V1] Remove pin_memory() in async_copy_to_gpu to fix sporadic stalls ( #37006 )
...
Signed-off-by: Sebastien Beurnier <sbeurnier@together.ai >
2026-03-14 01:37:32 +00:00
Li, Jiang
092ace9e3a
[UX] Improve UX of CPU backend ( #36968 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-14 09:27:29 +08:00
Andrew Xia
f680dc1b39
[responsesAPI] prioritize content over summary in reasoning item input ( #36516 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com >
2026-03-14 09:20:30 +08:00
Giulio Leone
b41aa264f9
fix: resolve chat template names before kwargs detection ( #36937 )
...
Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com >
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-03-14 00:20:16 +00:00
Dimitrios Bariamis
367cf5cd3e
[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype ( #36931 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
2026-03-13 16:41:16 -07:00
haosdent
6d53efd2a5
[Bugfix] Fix MLA attention crash with AWQ/GPTQ quantized models ( #34695 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-13 23:25:41 +00:00
Benjamin Chislett
8b346309a5
[Refactor] Consolidate SupportsEagle ( #36063 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-13 23:22:40 +00:00
Nick Hill
54a6db827f
[BugFix] Fix "DP Coordinator receives unexpected..." messages ( #37008 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 23:18:05 +00:00
Matthew Bonanni
9efc4db965
[Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces ( #37004 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-13 22:55:36 +00:00
Kevin H. Luu
f1816fb192
[CI] Split V1 e2e + engine (1 GPU) into separate jobs ( #36945 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 14:16:02 -07:00
Harry Mellor
0005d2a3c9
Use Transformers v5 WeightRenaming for Transformers modeling backend ( #31545 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-13 20:49:08 +00:00
Ekagra Ranjan
d0b402974f
[Bugfix][Spec Decode] Avoid double call of Ngram CPU ( #36952 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-13 20:33:19 +00:00
Divakar Verma
6341d43043
[ROCm][Quantization] add quark w4a8 mxfp4_fp8 for LinearLayer ( #35316 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-03-13 19:44:24 +00:00
Mark McLoughlin
7afe0faab1
[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish ( #36666 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 12:10:06 -07:00
Harry Mellor
5a3f1eb62f
[Misc] Set default kv_buffer_device in a better way ( #36862 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-13 19:07:33 +00:00
yugong333
b3ce711b93
Fp8 lora dense kernel ( #35242 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-03-13 19:05:08 +00:00
Isotr0py
abf61aaa8e
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request ( #36800 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-13 18:16:05 +00:00
bigmoyan
4508532fbd
[Bugfix] fix paddleocr crash on some image shape ( #36959 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Signed-off-by: bigmoyan <moyan_work@foxmail.com >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-13 13:46:55 +00:00
Itay Alroy
d5af196c18
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP ( #35627 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-03-13 09:25:33 -04:00
Chaojun Zhang
82f836d976
[XPU] Support LoRA via torch.compile on XPU platform ( #36962 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com >
2026-03-13 10:34:59 +00:00
Andreas Karatzas
4fccd30f19
[ROCm][CI] Upgrading orchestrator to handle python pipeline markers and options ( #36181 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-13 02:04:22 -07:00
Or Ozeri
cfaf4668f7
[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec ( #36610 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-13 08:04:21 +00:00
Andreas Karatzas
99a57bdf74
[ROCm][CI] Corrected the GPT-OSS test root path ( #36711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-13 15:53:43 +08:00
Sage
a2268617cf
[Frontend] Delegate preprocessing to OpenAIServingRender ( #36483 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-13 00:39:43 -07:00
Rohan Potdar
a4ad9db541
Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) ( #35786 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-13 07:33:22 +00:00
Nick Hill
b373b5102a
[Tests] Shutdown test RemoteVLLMServer cleanly ( #36950 )
...
Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).
This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.
Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 07:32:55 +00:00
Thomas Parnell
f296a1966d
[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs ( #36876 )
2026-03-13 07:09:39 +01:00
Csrayz
bc2c0c86ef
[Frontend] Fix usage incorrectly returned with empty stream_options` ( #36379 )
...
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com >
2026-03-13 03:33:04 +00:00