Jee Jee Li
dd20ee4e3e
[UX] Enable torch_profiler_with_stack ( #37571 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-20 11:17:26 +00:00
wang.yuqi
ed359c497a
[Model] Deprecate the score task (this will not affect users). ( #37537 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-20 08:07:56 +00:00
Wangbei25
0674d1fee7
[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder ( #37293 )
...
Signed-off-by: Wangbei25 <wangbei41@huawie.com >
Signed-off-by: Wangbei25 <wangbei41@huawei.com >
Co-authored-by: Wangbei25 <wangbei41@huawie.com >
2026-03-20 06:24:07 +00:00
bnellnm
91be5f9be3
[MoE Refactor] Rename "naive" all2all backend ( #36294 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-03-19 15:50:34 -04:00
Lucas Kabela
7769b58307
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict ( #37345 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-19 17:26:12 +00:00
Ifta khairul Alam Adil
104605cbf2
Remove deprecated reasoning_content message field(part-2) ( #37480 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com >
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Philip Ottesen <phiott256@gmail.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: Andy Lo <andy@mistral.ai >
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com >
Signed-off-by: sihao.li <sihao.li@intel.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Philip Ottesen <phiott256@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com >
Co-authored-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com >
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 15:20:08 +00:00
wang.yuqi
f9e2a38386
[Docs] Reorganize pooling docs. ( #35592 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 11:25:47 +00:00
Shwetha Poojary
cef1f302d2
[Model] Enable LoRA support for tower and connector in H2OVL ( #31696 )
...
Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com >
2026-03-18 13:26:47 +00:00
Aaron Hao
47a1f11bff
[docs] Add docs for new RL flows ( #36188 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-18 09:04:26 +00:00
Athrael Soju
c0745a851a
[Model] Add ColQwen3.5 4.5B support ( #36887 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-17 21:17:02 +00:00
Wei Zhao
b36adfa349
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache ( #37252 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-03-17 20:09:20 +00:00
Bhoomit
3717a4dd47
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules ( #34984 )
...
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-17 14:36:41 +00:00
Walter Beller-Morales
061980c36a
[Feature][Frontend] add support for Cohere Embed v2 API ( #37074 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-16 19:55:53 -04:00
Yuchen Fama
31a458c091
[Doc] Clarify schema enforcement behavior for tool_choice modes ( #37064 )
...
Signed-off-by: yfama <yuchengu@gmail.com >
2026-03-16 22:27:42 +00:00
Matthew Bonanni
93f3c8e531
[Misc] Add float16 to CacheDType ( #37199 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-16 13:24:48 -07:00
Lucas Kabela
714c6e0eab
[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set ( #36288 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-16 19:42:34 +00:00
Max de Bayser
9f9ecff4cd
Add simple granite4 tool parser ( #36827 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2026-03-16 10:49:09 -07:00
Harry Mellor
9b005edc48
[Docs] Make the link to hardware plugins clearer ( #37174 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 04:12:58 -07:00
SoluMilken
d8f8a7aad2
[Misc] Sync pre-commit to 4.5.1 in workflows and docs ( #36675 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 10:03:21 +00:00
leo-cf-tian
2754231ba3
[Kernel] Add FlashInfer MoE A2A Kernel ( #36022 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Signed-off-by: Leo Tian <lctian@nvidia.com >
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com >
2026-03-15 23:45:32 -07:00
bigshanedogg
2390d44209
[Model] Add HyperCLOVAX-SEED-Think-14B language model support ( #37107 )
...
Signed-off-by: bigshanedogg <bigshane319@gmail.com >
2026-03-16 06:40:05 +00:00
Hari
a3e2e250f0
[Feature] Add Azure Blob Storage support for RunAI Model Streamer ( #34614 )
...
Signed-off-by: hasethuraman <hsethuraman@microsoft.com >
2026-03-15 19:38:21 +08:00
arlo
8c29042bb9
[Feature] Add InstantTensor weight loader ( #36139 )
2026-03-14 18:05:23 +01:00
Li, Jiang
092ace9e3a
[UX] Improve UX of CPU backend ( #36968 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-14 09:27:29 +08:00
whyiug
1ce13cf992
[Model] Add support for BERT-like Chinese ERNIE pooling models ( #36385 )
...
Signed-off-by: whyiug <whyiug@hotmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-13 03:23:53 +00:00
Nikita
10f08dedfa
[Model] Add ColPali late interaction model for multi-modal retrieval ( #36818 )
...
Signed-off-by: Nikita Sukharev <kaonael@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-13 02:18:57 +00:00
Xinan Miao
2cdf92228c
[Feature]: Remove Chunking From FusedMoE ( #34086 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Signed-off-by: Southwest <1403572259@qq.com >
Signed-off-by: southwest <am1ao@qq.com >
Signed-off-by: Xinan Miao <1403572259@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-03-12 14:24:38 -04:00
Harry Mellor
e39257a552
Add AGENTS.md ( #36877 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 10:20:50 -07:00
grimulkan
a1257fd1ea
[Kernel] Add FP8 KV cache support to Triton MLA decode attention ( #34597 )
...
Signed-off-by: grimulkan <grimulkan@gmail.com >
2026-03-12 08:32:34 -07:00
Kunshang Ji
53ec16a705
[Hardware] Replace torch.cuda.device_count/current_device/set_device API ( #36145 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-12 07:57:47 -07:00
Mark McLoughlin
5282c7d4d0
[docs] Add lightweight AI assisted contribution policy ( #30947 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-12 11:46:13 +00:00
sfeiqiang
8cb24d3aed
[KV Connector] Support using FlexKV as KV Cache Offloading option. ( #34328 )
...
Signed-off-by: phaedonsun <phaedonsun@tencent.com >
Co-authored-by: phaedonsun <phaedonsun@tencent.com >
2026-03-12 00:46:20 -07:00
Louie Tsai
17852aa503
more models for vLLM Benchmark Suite ( #35086 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-03-12 11:36:51 +08:00
Kunshang Ji
513949f95f
[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu ( #36831 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-12 01:46:02 +00:00
Nick Hill
262b76a09f
[Frontend] Exclude anthropic billing header to avoid prefix cache miss ( #36829 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 01:20:34 +00:00
Harry Mellor
35db669f1d
Correct link to supported hardware on vllm.ai ( #36798 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 08:43:28 -07:00
Wuxun Zhang
e584dce52b
Add XPU MLA Sparse backend for DeepSeek v3.2 ( #33230 )
...
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com >
2026-03-11 19:19:15 +08:00
JartX
a40ee486f2
[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 ( #35923 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: akaratza <akaratza@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-03-11 07:45:57 +00:00
tunglinwood
42fadebecb
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct ( #36127 )
...
Signed-off-by: tunglinwood <tunglinwood@gmail.com >
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com >
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com >
2026-03-10 21:24:48 -07:00
Hojin Yang
0836be3b03
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support ( #31471 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-10 10:59:19 +08:00
Lucas Kabela
3fd03f1ec2
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder ( #36281 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-09 18:22:05 +00:00
Simon Mo
fe0c085c28
[Docs] Remove the reo beacon ( #36528 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-09 11:16:50 -07:00
Russell Bryant
d460a18fc6
[Docs] Expand --allowed-media-domains security guidance with threat details ( #36506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-09 17:43:42 +00:00
Andreas Karatzas
c174d54f86
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks ( #36292 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:02:41 -05:00
Harry Mellor
74a9f54cdb
[CI] Fix edge case that could lead to broken docs builds on main ( #36515 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 09:06:19 -07:00
Cyrus Leung
f96c3ab08c
[Deprecation][1/2] Remove items deprecated in v0.18 ( #36470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 03:43:23 -07:00
Alex Brooks
65a4da1504
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) ( #36160 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-09 05:46:23 +00:00
wang.yuqi
dcf8862fd4
[Examples][1/n] Resettle basic examples. ( #35579 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:22:53 -07:00
Wentao Ye
384425f84e
[Dependency] Remove default ray dependency ( #36170 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 20:06:22 -07:00
Harry Mellor
a0f44bb616
Allow markdownlint to run locally ( #36398 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:05:24 -07:00