Benjamin Bartels
|
0e5a9382af
|
[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
|
2026-03-16 22:01:57 +08:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
Tianyu Guo
|
43a73f853b
|
Remove unused EVS functions in qwen3_vl.py (#37183)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2026-03-16 13:09:09 +00:00 |
|
Julien Denize
|
ffbc2e5bdb
|
Patch Mistral config (#37104)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-03-16 12:22:18 +00:00 |
|
Lukas Geiger
|
f9e6db3034
|
[Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 12:11:59 +00:00 |
|
elvischenv
|
d61d2b08e9
|
[Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 (#36229)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 12:09:27 +00:00 |
|
Artem Perevedentsev
|
f5e59ee7a6
|
[Performance] Add prefetch for checkpoints to OS page cache (#36012)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-16 11:32:02 +00:00 |
|
Harry Mellor
|
9b005edc48
|
[Docs] Make the link to hardware plugins clearer (#37174)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 04:12:58 -07:00 |
|
Robin Nabel
|
bf9a185395
|
GLM4 tool parser: fix streaming mode (#35208)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-16 18:48:52 +08:00 |
|
Harry Mellor
|
ad041c79db
|
Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:31:16 +00:00 |
|
Kunshang Ji
|
747b068136
|
[Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-03-16 10:24:48 +00:00 |
|
Harry Mellor
|
122f75d939
|
Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:20:37 +00:00 |
|
SoluMilken
|
d8f8a7aad2
|
[Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:03:21 +00:00 |
|
Roy Wang
|
0115e957d4
|
[Frontend][Misc] Remove unused log in /is_sleeping (#37093)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-03-16 17:46:28 +08:00 |
|
haosdent
|
116ed130f4
|
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-16 10:30:23 +01:00 |
|
Vadim Gimpelson
|
8374387bd8
|
[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-16 09:04:29 +00:00 |
|
Isotr0py
|
912fbe9555
|
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 08:56:06 +00:00 |
|
Laith Sakka
|
52131f88d9
|
use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks (#36204)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-03-16 08:52:31 +00:00 |
|
Roy Wang
|
821eb80c0d
|
[Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-03-16 01:33:36 -07:00 |
|
Andreas Karatzas
|
a2956a0f8e
|
[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:08:51 +08:00 |
|
Andreas Karatzas
|
911355e216
|
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:07:27 +08:00 |
|
Chauncey
|
8d3f8f485e
|
[Bugfix] fix Qwen3.5 tool calling bug (#36774)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-16 15:38:42 +08:00 |
|
Woosuk Kwon
|
96efb91480
|
[Model Runner V2] Fix processed logits in sample() (#37144)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-16 00:35:49 -07:00 |
|
leo-cf-tian
|
2754231ba3
|
[Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
|
2026-03-15 23:45:32 -07:00 |
|
bigshanedogg
|
2390d44209
|
[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
|
2026-03-16 06:40:05 +00:00 |
|
Li, Jiang
|
7362b4450a
|
[Bugfix] Avoid LD_PRELOAD check on MacOS (#37145)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-15 23:31:44 -07:00 |
|
Andreas Karatzas
|
57a314d155
|
[CI][Bugfix] Fix 500 errors from priority overflow and TemplateError subclasses in schema fuzz tests (#37127)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 05:27:21 +00:00 |
|
Andreas Karatzas
|
d4c57863f7
|
[ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 04:49:31 +00:00 |
|
Wang, Yiting
|
68e1b711f1
|
[XPU] Add deepseek_scaling_rope fused kernel (#36612)
Signed-off-by: yitingw1 <yiting.wang@intel.com>
|
2026-03-16 12:35:08 +08:00 |
|
rasmith
|
0024f39a32
|
[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality (#34907)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-16 10:36:51 +08:00 |
|
Andrew Xia
|
e9163b536e
|
[responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-03-15 17:12:26 -07:00 |
|
Lalithnarayan C
|
7acaea634c
|
In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-15 23:35:35 +00:00 |
|
Jiangyun Zhu
|
697e4ff352
|
[GDN] add a config for gdn kernel selection (#36647)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-16 00:40:17 +08:00 |
|
Hari
|
a3e2e250f0
|
[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
|
2026-03-15 19:38:21 +08:00 |
|
Isotr0py
|
143e4dccdf
|
[Misc] Add online audio_in_video test (#36775)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-15 00:14:11 -07:00 |
|
Isotr0py
|
6590a3ecda
|
[Frontend] Remove torchcodec from audio dependency (#37061)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-15 05:15:59 +00:00 |
|
Russell Bryant
|
b3debb7e77
|
[Build] Upgrade xgrammar to get a security fix (#36168)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-15 03:13:48 +00:00 |
|
Nick Hill
|
458c1a4b2d
|
[Frontend] Reduce chat template warmup logging levels (#37062)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-14 13:48:59 -07:00 |
|
Karan Bansal
|
821fde2df4
|
[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384)
Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
|
2026-03-14 17:29:06 +00:00 |
|
arlo
|
8c29042bb9
|
[Feature] Add InstantTensor weight loader (#36139)
|
2026-03-14 18:05:23 +01:00 |
|
Cyrus Leung
|
5467d137b3
|
[Frontend] Avoid startup error log for models without chat template (#37040)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-14 09:36:11 -07:00 |
|
Santino Ramos
|
3ed46f374b
|
[Model Runner V2] Add Support for XD-RoPE (#36817)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
|
2026-03-14 09:27:55 -07:00 |
|
seanmamasde
|
84868e4793
|
[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats (#35109)
Signed-off-by: seanmamasde <seanmamasde@gmail.com>
|
2026-03-14 08:44:03 -07:00 |
|
Isotr0py
|
a8e8d62dd8
|
[Misc] Clean up Kimi-audio whisper encoder loading (#36903)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-14 23:37:52 +08:00 |
|
Julien Denize
|
e42b49bd69
|
Mistral common v10 (#36971)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: root <root@h200-bar-196-227.slurm-bar-compute.tenant-slurm.svc.cluster.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-14 07:26:43 -07:00 |
|
Sergey Zinchenko
|
4a718e770d
|
[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests (https://github.com/vllm-project/vllm/issues/35665) (#35684)
|
2026-03-14 14:10:11 +00:00 |
|
Kevin H. Luu
|
600a039f57
|
[CI] Shard Multi-Modal Models (Standard) into 4 parallel jobs (#37014)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-14 08:26:54 +00:00 |
|
Harry Mellor
|
ffa5d74f15
|
Enable loading of fused expert weights in the Transformers modelling backend (#36997)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-14 07:01:06 +00:00 |
|
Kevin H. Luu
|
74fe80ee95
|
[CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs (#37015)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-14 12:21:13 +08:00 |
|
Flora Feng
|
bcfdadb1bc
|
[Refactor] Relocate chat completion and anthropic tests (#36919)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-14 12:16:16 +08:00 |
|