whyiug
1ce13cf992
[Model] Add support for BERT-like Chinese ERNIE pooling models ( #36385 )
...
Signed-off-by: whyiug <whyiug@hotmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-13 03:23:53 +00:00
Nikita
10f08dedfa
[Model] Add ColPali late interaction model for multi-modal retrieval ( #36818 )
...
Signed-off-by: Nikita Sukharev <kaonael@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-13 02:18:57 +00:00
Xinan Miao
2cdf92228c
[Feature]: Remove Chunking From FusedMoE ( #34086 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Signed-off-by: Southwest <1403572259@qq.com >
Signed-off-by: southwest <am1ao@qq.com >
Signed-off-by: Xinan Miao <1403572259@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-03-12 14:24:38 -04:00
Harry Mellor
e39257a552
Add AGENTS.md ( #36877 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 10:20:50 -07:00
grimulkan
a1257fd1ea
[Kernel] Add FP8 KV cache support to Triton MLA decode attention ( #34597 )
...
Signed-off-by: grimulkan <grimulkan@gmail.com >
2026-03-12 08:32:34 -07:00
Kunshang Ji
53ec16a705
[Hardware] Replace torch.cuda.device_count/current_device/set_device API ( #36145 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-12 07:57:47 -07:00
Mark McLoughlin
5282c7d4d0
[docs] Add lightweight AI assisted contribution policy ( #30947 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-12 11:46:13 +00:00
sfeiqiang
8cb24d3aed
[KV Connector] Support using FlexKV as KV Cache Offloading option. ( #34328 )
...
Signed-off-by: phaedonsun <phaedonsun@tencent.com >
Co-authored-by: phaedonsun <phaedonsun@tencent.com >
2026-03-12 00:46:20 -07:00
Louie Tsai
17852aa503
more models for vLLM Benchmark Suite ( #35086 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-03-12 11:36:51 +08:00
Kunshang Ji
513949f95f
[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu ( #36831 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-12 01:46:02 +00:00
Nick Hill
262b76a09f
[Frontend] Exclude anthropic billing header to avoid prefix cache miss ( #36829 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 01:20:34 +00:00
Harry Mellor
35db669f1d
Correct link to supported hardware on vllm.ai ( #36798 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 08:43:28 -07:00
Wuxun Zhang
e584dce52b
Add XPU MLA Sparse backend for DeepSeek v3.2 ( #33230 )
...
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com >
2026-03-11 19:19:15 +08:00
JartX
a40ee486f2
[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 ( #35923 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: akaratza <akaratza@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-03-11 07:45:57 +00:00
tunglinwood
42fadebecb
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct ( #36127 )
...
Signed-off-by: tunglinwood <tunglinwood@gmail.com >
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com >
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com >
2026-03-10 21:24:48 -07:00
Hojin Yang
0836be3b03
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support ( #31471 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-10 10:59:19 +08:00
Lucas Kabela
3fd03f1ec2
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder ( #36281 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-09 18:22:05 +00:00
Simon Mo
fe0c085c28
[Docs] Remove the reo beacon ( #36528 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-09 11:16:50 -07:00
Russell Bryant
d460a18fc6
[Docs] Expand --allowed-media-domains security guidance with threat details ( #36506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-09 17:43:42 +00:00
Andreas Karatzas
c174d54f86
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks ( #36292 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:02:41 -05:00
Harry Mellor
74a9f54cdb
[CI] Fix edge case that could lead to broken docs builds on main ( #36515 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 09:06:19 -07:00
Cyrus Leung
f96c3ab08c
[Deprecation][1/2] Remove items deprecated in v0.18 ( #36470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 03:43:23 -07:00
Alex Brooks
65a4da1504
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) ( #36160 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-09 05:46:23 +00:00
wang.yuqi
dcf8862fd4
[Examples][1/n] Resettle basic examples. ( #35579 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:22:53 -07:00
Wentao Ye
384425f84e
[Dependency] Remove default ray dependency ( #36170 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 20:06:22 -07:00
Harry Mellor
a0f44bb616
Allow markdownlint to run locally ( #36398 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:05:24 -07:00
Kunshang Ji
fde4771bbd
[XPU][Doc] update xpu document about triton dependency/conflict issue. ( #36301 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-09 02:09:22 +00:00
Wei Zhao
379689d533
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse ( #35891 )
2026-03-07 13:51:54 -08:00
rahul-sarvam
85f50eb41f
Adding support to Sarvam's MoE models ( #33942 )
...
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com >
2026-03-08 01:16:24 +08:00
lif
00b814ba5a
[V0 Deprecation] Remove unused swap_space parameter ( #36216 )
...
Signed-off-by: majiayu000 <1835304752@qq.com >
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
Copilot
ce8546a12b
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page ( #35538 )
...
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: ProExpertProg <luka.govedic@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-06 23:55:06 +00:00
Andreas Karatzas
807d680337
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance ( #35553 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 15:15:12 +08:00
Xiang Shi
e68de8adc0
docs: fix wrong cc in int8.md ( #36209 )
...
Signed-off-by: Xiang Shi <realkevin@tutanota.com >
2026-03-06 06:01:02 +00:00
Rohan Potdar
c5362c739f
Reenable features for ROCm attention backends ( #36185 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-05 20:21:06 -08:00
Yanhong Li
a911f4dd20
[Model] Add support for OLMo Hybrid ( #32550 )
2026-03-05 14:51:06 -05:00
Jiayi Yan
6a895197fa
[Bugfix][CI] fix typos ( #34934 )
...
Signed-off-by: 1195343015 <1195343015@qq.com >
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 17:05:46 +00:00
Sage Moore
8c760b6ab6
[ROCm] Refactor ROCm attention backend selection logic ( #35246 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-03-05 10:51:26 -06:00
Harry Mellor
8df523351f
[Docs] Only build docs if documentation or ready labels are present ( #36135 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 13:58:16 +00:00
Kunshang Ji
66a2209645
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize ( #36085 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-05 10:36:39 +00:00
Paco Xu
7493c51c55
[Docs] add Dynamo/aibrix integration and kubeai/aks link ( #32767 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-03-05 17:39:50 +08:00
Reagan Lee
ac773bbe80
[Docs] Update docs to include mm processor + encoder benchmarks ( #34083 )
...
Signed-off-by: Reagan <reaganjlee@gmail.com >
2026-03-05 01:38:25 -08:00
zihaoanllm
d106bf39f5
[Doc] Add Parallel Draft Models ( #35973 )
...
Signed-off-by: <zihaoan2@amd.com >
Signed-off-by: zihaoanllm <zihaoan2@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 05:44:07 +00:00
Russell Bryant
636ee223ac
[Docs] Document security risks of GPT-OSS Python tool ( #35139 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 20:27:31 +00:00
Davina Zaman
138d891d7f
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode ( #32441 )
...
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 11:44:39 -08:00
Maxime Grenu
32224f568a
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR ( #34882 )
...
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:31:35 -08:00
Abhishek Mathukiya
f3dc292e9f
docs: add version requirement note for --profiler-config flag ( #32454 )
...
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu >
2026-03-04 18:13:54 +00:00
Chen
138c5fa186
[Docs] Add RunPod GPU deployment guide for vLLM ( #34531 )
...
Signed-off-by: lisperz <zhuchen200245@163.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:11:34 -08:00
Russell Bryant
2f2c1d73a7
[Docs] Upgrade dynamic LoRA warning to admonition block ( #35218 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 10:01:42 -08:00
Michael Yao
fd3bfe74c9
[Docs] Update design/multiprocessing.md ( #30677 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2026-03-04 17:58:59 +00:00
Sage
d25c1ec3c9
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build ( #35090 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-04 17:45:35 +00:00