Augusto Yao
|
9c7cab5ebb
|
[Feature]: Support for multiple embedding types in a single inference call (#35829)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2026-03-17 17:05:42 +08:00 |
|
Chauncey
|
132bfd45b6
|
[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-17 08:54:52 +00:00 |
|
xiao-llm
|
24b4272a8c
|
Fix infinite recursive search issue in quark.py (#32779)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com>
Signed-off-by: kimheesu <wlskaka4@gmail.com>
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com>
|
2026-03-17 07:19:15 +00:00 |
|
Benjamin Chislett
|
8a680463fa
|
[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447)
|
2026-03-17 07:07:33 +01:00 |
|
Nick Cao
|
20b14095a4
|
[Bugfix] Fix loading Music Flamingo (#35535)
Signed-off-by: Nick Cao <ncao@redhat.com>
|
2026-03-17 05:24:40 +00:00 |
|
PatchyTIS
|
17c1bdf371
|
[Bugfix] dtype mismatch in ngram gpu propose (#37246)
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
|
2026-03-17 05:19:55 +00:00 |
|
Flora Feng
|
3e3d320c1b
|
[Refactor] Relocate responses API tests (#37241)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 05:14:52 +00:00 |
|
Andreas Karatzas
|
54a62a79f7
|
[ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
v0.17.2rc0
|
2026-03-17 11:34:49 +08:00 |
|
Flora Feng
|
384dc7f77b
|
[Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 11:31:23 +08:00 |
|
Flora Feng
|
f04d5226f8
|
[CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 03:24:34 +00:00 |
|
Kyuyeun Kim
|
0a0a1a198b
|
Add ability to replace oot ops when using lora (#37181)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2026-03-16 18:04:15 -07:00 |
|
Vadim Gimpelson
|
6c1cfbad32
|
Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
|
2026-03-16 17:48:42 -07:00 |
|
Harry Huang
|
45f526d652
|
[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-03-17 00:38:52 +00:00 |
|
Julien Denize
|
5db91f0aaf
|
Fix some Mistral parser issues (#37209)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-03-17 00:08:56 +00:00 |
|
Walter Beller-Morales
|
061980c36a
|
[Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-16 19:55:53 -04:00 |
|
Ben Browning
|
7a49742b88
|
[CI/Build] Add common tool call parser test suite (#27599)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-16 19:46:20 -04:00 |
|
Terry Gao
|
3e6a1e1686
|
[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-16 18:51:46 -04:00 |
|
Julien Denize
|
7961486a9b
|
Fix EagleMistralLarge3Model initialization (#37232)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-03-16 15:41:00 -07:00 |
|
Andreas Karatzas
|
4f9b14c21c
|
[CI] Stabilize multinode DP internal LB completion tests (#36356)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 15:40:23 -07:00 |
|
Yuchen Fama
|
31a458c091
|
[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064)
Signed-off-by: yfama <yuchengu@gmail.com>
|
2026-03-16 22:27:42 +00:00 |
|
Wei Zhao
|
a3a51d20e7
|
[Benchmark] Improvements to attention benchmark script (#37115)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-16 22:22:40 +00:00 |
|
EdalatiAli
|
e5b807607c
|
[Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
|
2026-03-16 18:07:39 -04:00 |
|
Elvir Crnčević
|
fd4d96302a
|
Fix eplb nvfp4 experts hook (#37217)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Elvir Crncevic <elvir@anthropic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-16 22:03:54 +00:00 |
|
Krish Gupta
|
c0f011918d
|
[Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (#36779)
Signed-off-by: Krish Gupta <krishom70@gmail.com>
|
2026-03-16 21:11:33 +00:00 |
|
Zhengxu Chen
|
e6ae4b1be1
|
[compile] Enable mega aot artifact for torch 2.12+. (#37198)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-16 21:05:51 +00:00 |
|
zhanqiuhu
|
2dccb38f73
|
[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors (#36549)
v0.18.0rc0
|
2026-03-16 20:51:04 +00:00 |
|
Kunshang Ji
|
d157216093
|
[BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer (#37197)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-16 21:39:56 +01:00 |
|
Matthew Bonanni
|
93f3c8e531
|
[Misc] Add float16 to CacheDType (#37199)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:24:48 -07:00 |
|
rasmith
|
2cc26c3a99
|
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-16 13:22:57 -07:00 |
|
Flora Feng
|
dfa8852db2
|
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-16 15:53:07 -04:00 |
|
Lucas Kabela
|
714c6e0eab
|
[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-16 19:42:34 +00:00 |
|
Sage
|
0fefd00e6c
|
[Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-16 18:59:01 +00:00 |
|
Nicolò Lucchesi
|
f5c081d432
|
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
|
2026-03-16 19:58:06 +01:00 |
|
Matthew Bonanni
|
c88ea8338b
|
[MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible (#36982)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:51:21 -04:00 |
|
Max de Bayser
|
9f9ecff4cd
|
Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2026-03-16 10:49:09 -07:00 |
|
haosdent
|
ca1954d58c
|
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-16 19:03:10 +02:00 |
|
Raushan Turganbay
|
55e6d3d5c0
|
[Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
Signed-off-by: raushan <raushan@huggingface.co>
|
2026-03-16 16:48:18 +00:00 |
|
Chauncey
|
6682c231fa
|
[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-16 16:27:47 +00:00 |
|
Itay Etelis
|
5ae685c1c8
|
[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout (#34158)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
|
2026-03-16 11:20:51 -04:00 |
|
Wentao Ye
|
ce8cf9161d
|
[Compile] Fix compile warning st256_cs in cuda_vec_utils.cuh (#36693)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-16 11:12:15 -04:00 |
|
xjx
|
18be11fd59
|
[BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:119 (#35594)
Signed-off-by: xjx <493337577@qq.com>
|
2026-03-16 15:10:42 +00:00 |
|
Yuanheng Zhao
|
8d8855fdae
|
[Bugfix] Add safety check and fallback for null scaling factor (#36106)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 14:27:29 +00:00 |
|
Wentao Ye
|
e855d380fa
|
[Compile] Fix compile warning in moe_permute (#36529)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-16 10:16:14 -04:00 |
|
Benjamin Bartels
|
0e5a9382af
|
[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
|
2026-03-16 22:01:57 +08:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
Tianyu Guo
|
43a73f853b
|
Remove unused EVS functions in qwen3_vl.py (#37183)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2026-03-16 13:09:09 +00:00 |
|
Julien Denize
|
ffbc2e5bdb
|
Patch Mistral config (#37104)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-03-16 12:22:18 +00:00 |
|
Lukas Geiger
|
f9e6db3034
|
[Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 12:11:59 +00:00 |
|
elvischenv
|
d61d2b08e9
|
[Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 (#36229)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 12:09:27 +00:00 |
|
Artem Perevedentsev
|
f5e59ee7a6
|
[Performance] Add prefetch for checkpoints to OS page cache (#36012)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-16 11:32:02 +00:00 |
|