Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Matthew Bonanni
77c4f45c6c
[7/N][Attention][Docs] Add documentation for attention backends ( #32477 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-28 17:20:22 -05:00
Vincent Gimenes
0b53bec60b
[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled ( #33109 )
...
Signed-off-by: Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com >
2026-01-27 03:05:02 +00:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 02:14:37 -08:00
Cyrus Leung
389aa1b2eb
[Doc] Update more docs with respect to V1 ( #29188 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-23 10:58:48 +08:00
Harry Mellor
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
Harry Mellor
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
Cyrus Leung
ef9676a1f1
[Doc] ruff format some Python examples ( #26767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:21:53 -07:00
Cyrus Leung
633f943e30
[Doc] Update Batch-level DP docs ( #25757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 02:37:40 -07:00
YiwenC
52bc9d5b3e
[Model] enable data parallel for InternVL vision encoder ( #23909 )
...
Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu >
Signed-off-by: YiwenC <54658925+666even666@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-17 21:11:46 -07:00
dongluw
a5b84f1cbf
[Core] Shared memory based object store for Multimodal data caching and IPC ( #20452 )
...
Signed-off-by: donglu <donglu@cohere.com >
2025-09-12 07:54:17 -07:00
co63oc
1bd007f234
fix some typos ( #24071 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com >
2025-09-02 20:44:50 -07:00
WeiQing Chen
2f0bab3f26
[Model] Support dp on ViT on GLM-4.5V ( #23168 )
...
Signed-off-by: David Chen <530634352@qq.com >
2025-09-02 10:48:18 +00:00
WeiQing Chen
a0e0efd6bd
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 ( #23817 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-09-01 16:56:56 +00:00
Jiangyun Zhu
3a6acad431
[Model] Enable encoder DP for MiniCPM-V ( #23948 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-08-30 06:31:26 -07:00
Cyrus Leung
fe8d7b6f03
[Model] Interface to enable batch-level DP support ( #23733 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-27 06:41:22 -07:00
Michael Yao
5bd9f84158
[Docs] Fix an admonition important ( #23726 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-08-27 02:50:09 -07:00
Cyrus Leung
69244e67e6
[Core] Use key-only cache for BaseMultiModalProcessor ( #23018 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-27 14:19:13 +08:00
Didier Durand
7c04779afa
[Doc]: fix various spelling issues in multiple files ( #23636 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-08-26 14:05:29 +00:00
Cyrus Leung
e269be2ba2
[Doc] Add caution for API server scale-out ( #23550 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-25 06:14:15 -07:00
WeiQing Chen
23c939fd30
[Model] Support DP for ViT on MiniCPM-V-4 ( #23327 )
...
Signed-off-by: ycyaw66 <497410282@qq.com >
Co-authored-by: ycyaw66 <497410282@qq.com >
2025-08-23 02:14:41 +00:00
Cyrus Leung
5cc54f7c5b
[Doc] Fix batch-level DP example ( #23325 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-08-21 06:16:38 -07:00
Cyrus Leung
5efd6905bc
[CLI][Doc] Formalize --mm-encoder-tp-mode ( #23190 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-20 23:42:28 +08:00
Tialo
2c3f557f08
[Doc] use power of 2 ( #23172 )
2025-08-19 03:16:23 -07:00
Cyrus Leung
139d155781
[Frontend] Use engine argument to control MM cache size ( #22441 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-07 09:47:10 -07:00
Cyrus Leung
766bc8162c
[Core] Store only the keys for multi-modal data in P0 ( #22198 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-07 01:45:04 -07:00
Cyrus Leung
1cb194a018
[Doc] Reorganize user guide ( #18661 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:25:33 -07:00