Lucas Wilkinson
8b5014d3dd
[Attention] FA4 integration ( #32974 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-01 23:44:57 +00:00
Augusto Yao
8e75d88554
add io_process_plugin for sparse embedding ( #34214 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-28 09:16:37 +00:00
Micah Williamson
0edf101d2b
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN ( #35527 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-28 12:16:34 +08:00
Gregory Shtrasberg
9fa6c68fa6
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends ( #35334 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-27 21:32:55 +00:00
Wentao Ye
062b789632
[Bug] Fix outdated links in source code ( #35314 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-27 03:50:46 +00:00
Tyler Michael Smith
eb19955c37
[WideEP] Remove pplx all2all backend ( #33724 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-26 14:30:10 -08:00
Joao Gante
709eadbb0b
Doc link typo ( #35281 )
...
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
lichuang
2c619e5e3f
[Docs]Fix documentation formatting in architecture overview ( #34679 )
...
Signed-off-by: codedump <lichuang1982@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:00:15 +00:00
Cyrus Leung
a766b30349
[Renderer] Deprecate code paths for old input processing ( #34775 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 00:35:04 -08:00
Matthew Bonanni
dc5fa77a4e
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs ( #34457 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-17 14:01:27 -05:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Cyrus Leung
c9a1923bb4
[Plugin] Simplify IO Processor Plugin interface ( #34236 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:47:39 -08:00
bnellnm
d1481ba783
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner ( #32344 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-10 19:51:07 -05:00
Michael Goin
5e75a14a66
[Doc] Add DCP support to attention backend doc ( #33936 )
2026-02-09 18:33:43 -05:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Nicolò Lucchesi
81a90e5277
[Docs] Add bart-plugin to docs ( #33905 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 12:20:25 +00:00
Richard Zou
fd9c83d0e0
[torch.compile] Document the workaround to standalone_compile failing ( #33571 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-03 07:16:55 +00:00
Cyrus Leung
793af538a3
[Doc] Update plugin deprecation notices ( #33476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 22:48:28 +08:00
jennyyyyzhen
527bcd14d4
[ROCM] Enable aiter attn backend for qwen3-next model ( #32492 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-01-31 17:03:57 +08:00
Didier Durand
31b25f6516
[Doc]: fixing multiple typos in diverse files ( #33256 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 16:52:03 +08:00
Matthew Bonanni
77c4f45c6c
[7/N][Attention][Docs] Add documentation for attention backends ( #32477 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-28 17:20:22 -05:00
Angela Yi
fb7abfc1d0
[docs] Improve tlparse section ( #33211 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 02:07:37 +00:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
Robert Shaw
5a93b9162b
[MoE Refactor] Integrate Naive Prepare Finalize into MK ( #32567 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com >
2026-01-27 01:28:02 +00:00
Alex Brooks
9ac818a551
[Misc] HF Hub LoRA Resolver ( #20320 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-26 13:56:32 +00:00
wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Lucas Kabela
c80f92c14d
[Documentation] Fix typo in docs/design/torch_compile_multimodal.md ( #32741 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-20 23:54:20 -08:00
Robert Shaw
c78ee240b3
Revert "[PluggableLayer][1/N] Define PluggableLayer" ( #32725 )
2026-01-21 00:21:06 +00:00
whx
4ca62a0dbd
[PluggableLayer][1/N] Define PluggableLayer ( #32331 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-20 16:19:21 +00:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
lon
73f2a81c75
docs: prefix caching seems quite outdated ( #28784 )
...
Signed-off-by: lon <114724657+longregen@users.noreply.github.com >
Signed-off-by: Russell Bryant <russell.bryant@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <russell.bryant@gmail.com >
2026-01-19 11:49:52 -08:00
Robert Shaw
4a6af8813f
[MoE Refactor] Move Test Impl into Test Dirs ( #32129 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-01-18 12:16:59 +08:00
Cyrus Leung
9ea07b41da
[1/N] Reorganize multimodal processing code ( #32327 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 15:25:31 +00:00
Andrew Bennett
f243abc92d
Fix various typos found in docs ( #32212 )
...
Signed-off-by: Andrew Bennett <potatosaladx@meta.com >
2026-01-13 03:41:47 +00:00
Matthew Bonanni
20228cb851
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py ( #32054 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-12 09:13:56 -08:00
XlKsyt
899541bdb1
[doc] fix broken links ( #32158 )
...
Signed-off-by: minimAluminiumalism <caixuesen@outlook.com >
2026-01-12 10:18:38 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Shanshan Shen
08d954f036
[Doc] Add developer guide for CustomOp ( #30886 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-01-09 16:21:11 +00:00
Lucas Kabela
f16bfbe5bc
[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders ( #31627 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-08 14:33:24 -05:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Robert Shaw
5dcd7ef1f2
[MoE Refactor][15/N] Apply Refactor to Fp8 ( #31415 )
2026-01-07 19:42:33 -05:00
weiyu
e7596371a4
[Refactor][TPU] Remove torch_xla path and use tpu-inference ( #30808 )
...
Signed-off-by: Wei-Yu Lin <weiyulin@google.com >
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com >
2026-01-07 16:07:16 +08:00
Cyrus Leung
db318326a5
[Misc] Use deprecated for seed_everything ( #31780 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 11:29:55 +00:00
wangxiyuan
bb4337b34c
[Platform] Deprecate seed_everything ( #31659 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-01-04 18:34:04 -08:00
Harry Mellor
decc244767
[Docs] Use relative md links instead of absolute html links for cross referencing ( #31494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 13:33:44 +00:00
Steve Westerhouse
9d701e90d8
[Doc] Clarify FP8 KV cache computation workflow ( #31071 )
...
Signed-off-by: westers <steve.westerhouse@origami-analytics.com >
2025-12-22 08:41:37 +08:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 19:46:28 +00:00