Chauncey
|
9fe404ed04
|
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-12 15:03:50 +08:00 |
|
Sage
|
802f306cd1
|
[Tests] Skip model weight download for render-only test server (#36813)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-12 06:24:42 +00:00 |
|
Yan Ma
|
894843eb25
|
replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-11 23:12:57 -07:00 |
|
Yanan Cao
|
584a3f56de
|
[Kernel][Helion][13/N] Force static_shapes=False in helion register (#36677)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-12 05:35:29 +00:00 |
|
Nick Hill
|
36735fd772
|
[BugFix] Fix multiple/duplicate stdout prefixes (#36822)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-12 12:23:21 +08:00 |
|
wang.yuqi
|
6ecabe4936
|
[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-12 12:22:05 +08:00 |
|
Woosuk Kwon
|
2f8b4ce0c0
|
[Model Runner V2] Do not initialize sampler for non-last PP ranks (#36824)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-12 03:55:28 +00:00 |
|
Yuwei An
|
2ef69456f5
|
[LMCache] Fault Tolerance Mechanism (#36586)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2026-03-12 03:54:39 +00:00 |
|
Louie Tsai
|
17852aa503
|
more models for vLLM Benchmark Suite (#35086)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-03-12 11:36:51 +08:00 |
|
Flora Feng
|
8647c6cf51
|
[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 (#35895)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-12 10:25:14 +08:00 |
|
Kunshang Ji
|
513949f95f
|
[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
|
2026-03-12 01:46:02 +00:00 |
|
Nick Hill
|
262b76a09f
|
[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-12 01:20:34 +00:00 |
|
Wentao Ye
|
c34ba6b961
|
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-12 08:37:01 +08:00 |
|
Matthias Gehre
|
24062b704f
|
[ROCm][CI/Build] Add gfx1152/gfx1153 (Krackan) to HIP supported architectures (#36499)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-11 23:14:40 +00:00 |
|
Aaron Hao
|
d6b61e5166
|
[BUG] Fix async rlhf tests (#35811)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-03-11 18:06:10 -04:00 |
|
Yanan Cao
|
cf632499ee
|
[Kernel] [Helion] [15/N] Split config files into per-platform files (#36698)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 17:25:29 -04:00 |
|
Yanan Cao
|
a3774a8198
|
[Kernel] [Helion] [12/N] Use FakeTensorMode to avoid GPU allocation during config key computation (#36563)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 17:25:16 -04:00 |
|
Yanan Cao
|
0ce21c46a0
|
[Kernel] [Helion] [14/N] Set autotune_ignore_errors=True during autotuning (#36683)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 17:25:04 -04:00 |
|
Woosuk Kwon
|
55eed6b7a5
|
[Model Runner V2] Add WhisperModelState [6/N] (#35790)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-11 14:20:38 -07:00 |
|
Giancarlo Delfin
|
c77181e534
|
[Model Runner V2] Add probabilistic rejection sampling for spec decoding (#35461)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-11 14:04:32 -07:00 |
|
maobaolong
|
12001f2ebc
|
[LMCache] Pass TP size in lookup for MLA multi-reader locking (#36129)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
|
2026-03-11 20:45:20 +00:00 |
|
Or Ozeri
|
7ee5d5093b
|
[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 20:43:40 +00:00 |
|
jennyyyyzhen
|
428bc718bd
|
[Bugfix][ROCm] Strip block_size before attention backend validation (#36274)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-11 13:37:31 -07:00 |
|
汪志鹏
|
ff1e3d9c63
|
[BugFix]: add bagel to MM_PREFIX_LM_MODELS (#36316)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2026-03-11 19:55:59 +00:00 |
|
Wentao Ye
|
35bdca5431
|
[Refactor] Remove dead code in KV connector (#36424)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-11 19:40:17 +00:00 |
|
Amanzhol Salykov
|
8a24842765
|
[ROCm] add tuned moe_wna16_triton kernel configs for CDNA4 (#35093)
Signed-off-by: salykova <amsalykov@gmail.com>
Signed-off-by: amd-asalykov <asalykov@amd.com>
|
2026-03-11 19:00:08 +00:00 |
|
Harry Mellor
|
65986db6ba
|
Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 (#36787)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 18:12:43 +00:00 |
|
Luka Govedič
|
9556af87d5
|
[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
|
2026-03-11 10:56:55 -07:00 |
|
Or Ozeri
|
a1a3523a56
|
[KVConnector] Support worker -> scheduler metadata (#31964)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 17:36:37 +00:00 |
|
tianshu-Michael-yu
|
741f4e046b
|
fix: align lfm2 thumbnail token counting with HF (#36707)
|
2026-03-11 10:28:38 -07:00 |
|
Julien Denize
|
a5d06dc557
|
Add 320 dimension size support to MLA (#36161)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2026-03-11 10:21:22 -07:00 |
|
Harry Mellor
|
5efa206a8c
|
Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 17:10:23 +00:00 |
|
Cyrus Leung
|
196802dfa6
|
[Misc] Clean up renderers (#36770)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 16:39:29 +00:00 |
|
Isotr0py
|
c84b519cf3
|
[Bugfix] Fix negative max_tokens when input prompt is too long (#36789)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 16:30:51 +00:00 |
|
Flora Feng
|
741ecf0630
|
[CI] Add bfcl tool call correctness eval (#36560)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-11 12:27:36 -04:00 |
|
Robert Shaw
|
b7e5a588d8
|
[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (#36061)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-11 16:07:14 +00:00 |
|
Richard Zou
|
822e250ab7
|
[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-11 16:07:09 +00:00 |
|
Hongxin Xu
|
bea02cdf93
|
Fix routed experts capture for hybrid models (Mamba + Attention) (#35744)
Signed-off-by: arlenxu <arlenxu@tencent.com>
Signed-off-by: xhx1022 <1737006628@qq.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
|
2026-03-11 08:53:10 -07:00 |
|
Julien Denize
|
a3ea760ea5
|
Add 'none' reasoning effort to ChatCompletionRequest (#36238)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2026-03-11 15:45:34 +00:00 |
|
Harry Mellor
|
35db669f1d
|
Correct link to supported hardware on vllm.ai (#36798)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 08:43:28 -07:00 |
|
Julien Denize
|
afebeffbfb
|
Add support to Mistral large 3 eagle with dense layers (#36163)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-11 15:42:56 +00:00 |
|
Jhao-Ting Chen
|
5573894737
|
Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-03-11 11:36:11 -04:00 |
|
Harry Mellor
|
d5816c8c2f
|
Fix tied weights in weight mapping test for Transformers v5 (#36788)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 15:10:26 +00:00 |
|
Woosuk Kwon
|
8ccbcda5c0
|
[Model Runner V2] Remove unused warmup_for_prefill method (#36762)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-11 08:02:44 -07:00 |
|
tvirolai-amd
|
a9e532afe2
|
[ROCm][Perf] Allow MTP lens > 1 in Sparse MLA (#36681)
Signed-off-by: Teemu Virolainen <teemu.virolainen@amd.com>
|
2026-03-11 14:43:03 +00:00 |
|
Harry Mellor
|
f3163bba67
|
Disable docs build skipping until a better solution is found (#36790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 13:53:23 +00:00 |
|
Martin Hickey
|
700a1ddc65
|
[Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-11 13:37:46 +00:00 |
|
Silvia Colabrese
|
f33251ffc8
|
[Bugfix] Fix Mistral-small --format (#36782)
Signed-off-by: 12010486 <silvia.colabrese@intel.com>
|
2026-03-11 04:47:52 -07:00 |
|
Wuxun Zhang
|
e584dce52b
|
Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
|
2026-03-11 19:19:15 +08:00 |
|
Ning Xie
|
40c0461f24
|
[openapi] refactor render related openapi [3/N] (#36749)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-11 03:14:34 -07:00 |
|