Commit Graph

14985 Commits

Author SHA1 Message Date
Flora Feng
dfa8852db2 [Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-16 15:53:07 -04:00
Lucas Kabela
714c6e0eab [torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-03-16 19:42:34 +00:00
Sage
0fefd00e6c [Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-16 18:59:01 +00:00
Nicolò Lucchesi
f5c081d432 [PD][Nixl] Add support for hybrid SSM-FA models (#36687) 2026-03-16 19:58:06 +01:00
Matthew Bonanni
c88ea8338b [MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible (#36982)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-16 13:51:21 -04:00
Max de Bayser
9f9ecff4cd Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2026-03-16 10:49:09 -07:00
haosdent
ca1954d58c [Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-03-16 19:03:10 +02:00
Raushan Turganbay
55e6d3d5c0 [Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
Signed-off-by: raushan <raushan@huggingface.co>
2026-03-16 16:48:18 +00:00
Chauncey
6682c231fa [Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-16 16:27:47 +00:00
Itay Etelis
5ae685c1c8 [Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout (#34158)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
2026-03-16 11:20:51 -04:00
Wentao Ye
ce8cf9161d [Compile] Fix compile warning st256_cs in cuda_vec_utils.cuh (#36693)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-16 11:12:15 -04:00
xjx
18be11fd59 [BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:119 (#35594)
Signed-off-by: xjx <493337577@qq.com>
2026-03-16 15:10:42 +00:00
Yuanheng Zhao
8d8855fdae [Bugfix] Add safety check and fallback for null scaling factor (#36106)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 14:27:29 +00:00
Wentao Ye
e855d380fa [Compile] Fix compile warning in moe_permute (#36529)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-16 10:16:14 -04:00
Benjamin Bartels
0e5a9382af [Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
2026-03-16 22:01:57 +08:00
Fynn Schmitt-Ulms
04bf5a35fa [Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013) 2026-03-16 14:53:45 +01:00
Tianyu Guo
43a73f853b Remove unused EVS functions in qwen3_vl.py (#37183)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
2026-03-16 13:09:09 +00:00
Julien Denize
ffbc2e5bdb Patch Mistral config (#37104)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2026-03-16 12:22:18 +00:00
Lukas Geiger
f9e6db3034 [Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-16 12:11:59 +00:00
elvischenv
d61d2b08e9 [Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 (#36229)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 12:09:27 +00:00
Artem Perevedentsev
f5e59ee7a6 [Performance] Add prefetch for checkpoints to OS page cache (#36012)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
2026-03-16 11:32:02 +00:00
Harry Mellor
9b005edc48 [Docs] Make the link to hardware plugins clearer (#37174)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 04:12:58 -07:00
Robin Nabel
bf9a185395 GLM4 tool parser: fix streaming mode (#35208)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-03-16 18:48:52 +08:00
Harry Mellor
ad041c79db Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 10:31:16 +00:00
Kunshang Ji
747b068136 [Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
2026-03-16 10:24:48 +00:00
Harry Mellor
122f75d939 Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 10:20:37 +00:00
SoluMilken
d8f8a7aad2 [Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 10:03:21 +00:00
Roy Wang
0115e957d4 [Frontend][Misc] Remove unused log in /is_sleeping (#37093)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-03-16 17:46:28 +08:00
haosdent
116ed130f4 [Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-16 10:30:23 +01:00
Vadim Gimpelson
8374387bd8 [FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-03-16 09:04:29 +00:00
Isotr0py
912fbe9555 [Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-16 08:56:06 +00:00
Laith Sakka
52131f88d9 use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks (#36204)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2026-03-16 08:52:31 +00:00
Roy Wang
821eb80c0d [Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-03-16 01:33:36 -07:00
Andreas Karatzas
a2956a0f8e [ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 16:08:51 +08:00
Andreas Karatzas
911355e216 [ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 16:07:27 +08:00
Chauncey
8d3f8f485e [Bugfix] fix Qwen3.5 tool calling bug (#36774)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-16 15:38:42 +08:00
Woosuk Kwon
96efb91480 [Model Runner V2] Fix processed logits in sample() (#37144)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-16 00:35:49 -07:00
leo-cf-tian
2754231ba3 [Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
2026-03-15 23:45:32 -07:00
bigshanedogg
2390d44209 [Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
2026-03-16 06:40:05 +00:00
Li, Jiang
7362b4450a [Bugfix] Avoid LD_PRELOAD check on MacOS (#37145)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-03-15 23:31:44 -07:00
Andreas Karatzas
57a314d155 [CI][Bugfix] Fix 500 errors from priority overflow and TemplateError subclasses in schema fuzz tests (#37127)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 05:27:21 +00:00
Andreas Karatzas
d4c57863f7 [ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 04:49:31 +00:00
Wang, Yiting
68e1b711f1 [XPU] Add deepseek_scaling_rope fused kernel (#36612)
Signed-off-by: yitingw1 <yiting.wang@intel.com>
2026-03-16 12:35:08 +08:00
rasmith
0024f39a32 [ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality (#34907)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-03-16 10:36:51 +08:00
Andrew Xia
e9163b536e [responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126)
Signed-off-by: Andrew Xia <axia@meta.com>
2026-03-15 17:12:26 -07:00
Lalithnarayan C
7acaea634c In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:35:35 +00:00
Jiangyun Zhu
697e4ff352 [GDN] add a config for gdn kernel selection (#36647)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-03-16 00:40:17 +08:00
Hari
a3e2e250f0 [Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
2026-03-15 19:38:21 +08:00
Isotr0py
143e4dccdf [Misc] Add online audio_in_video test (#36775)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-15 00:14:11 -07:00
Isotr0py
6590a3ecda [Frontend] Remove torchcodec from audio dependency (#37061)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-15 05:15:59 +00:00