Commit Graph

629 Commits

Author SHA1 Message Date
Bowen Wang
e9fd658a73 [Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-06-26 15:30:21 -07:00
Michael Goin
1f5d178e9c Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128) 2025-06-26 07:32:22 -07:00
zhrrr
9f0608fc16 [Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
2025-06-25 21:03:17 +00:00
Aaron Pham
ba7ba35cda [Chore] debloat some initial logs (#19438)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-25 06:36:22 +00:00
David Xia
7108934142 [Frontend] speed up import time of vllm.config (#18036)
Signed-off-by: David Xia <david@davidxia.com>
2025-06-25 00:41:11 -04:00
cascade
e6327c9b3e [Feature] Support sequence parallelism for static fp8 quantization (#19181)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-06-23 16:09:02 -04:00
Aaron Pham
c4cf260677 [Perf][CLI] Improve overall startup time (#19941) 2025-06-22 23:11:22 +00:00
Aaron Pham
e91386cde1 [Chore] dedup logs (#19955) 2025-06-22 19:43:07 +00:00
Ning Xie
2bb246b8f7 [MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-22 13:39:09 +08:00
wangxiyuan
e773a9e1c2 [Misc] Clean up useless code (#19889)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-20 21:09:09 +00:00
Maximilien de Bayser
799397ee4f Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Ning Xie
6e9cc73f67 [MISC] correct DeviceConfig device field static type analysis (#19699)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:50 -07:00
Ning Xie
26bc46ef89 [MISC] typo fix (#19672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-16 07:18:49 +00:00
Ye (Charlotte) Qi
b692e9cd07 [Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-06-16 06:30:29 +00:00
Woosuk Kwon
055915e6ce Enable prefix caching with full cuda graphs (#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-15 01:05:05 -07:00
Woosuk Kwon
aafbbd981f [torch.compile] Use custom ops when use_inductor=False (#19618) 2025-06-13 15:05:54 -07:00
youkaichao
d70bc7c029 [torch.compile] reorganize the cache directory to support compiling multiple models (#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-06-13 15:23:25 +08:00
Luka Govedič
f98548b9da [torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2025-06-12 08:31:04 -07:00
rasmith
c7ea0b56cd [AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-06-11 15:53:28 -04:00
Richard Zou
77f0d465d0 [BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
Signed-off-by: rzou <zou3519@gmail.com>
2025-06-11 07:54:41 +08:00
Siyuan Liu
3a7cd627a8 [Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-06-09 16:41:51 -07:00
wang.yuqi
2ffb9b6e07 [Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201) 2025-06-08 07:17:53 -07:00
Richard Zou
eaa2e51088 [Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-08 08:56:12 +08:00
Richard Zou
da511d54d8 Fix CompilationConfig repr (#19091)
Signed-off-by: rzou <zou3519@gmail.com>
2025-06-06 16:23:35 +08:00
Chen Zhang
f8a1a2d108 [v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-05 20:47:09 -07:00
Cyrus Leung
01dc9a76db [CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-04 04:49:20 -07:00
Varun Sundar Rabindranath
fa98d77773 [Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-06-03 12:30:02 -07:00
Simon Mo
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Rui Qiao
bdce64f236 [V1] Support DP with Ray (#18779) 2025-06-02 21:15:13 -07:00
Siyuan Liu
9112b443a0 [Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-06-03 00:06:20 +00:00
Gregory Shtrasberg
ca2f6b9c30 [Bugfix][Model] Attempt to fix eagle in V0. (#18978)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-06-02 08:15:53 -07:00
jennyyyyzhen
ebb1ec9318 [Model] enable data parallel for Llama4 vision encoder (#18368)
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
2025-06-02 19:22:54 +08:00
Cyrus Leung
6aa8f9a4e7 [Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-01 11:04:23 +08:00
Satyajith Chilappagari
2a50ef5760 [Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-31 10:39:11 +00:00
Yikun Jiang
3c49dbdd03 Skip device and quant Pydantic validation to make plugin device work (#18843)
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-05-28 20:12:30 -07:00
aws-elaineyz
1661a9c28f [Doc][Neuron] Update documentation for Neuron (#18868)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-28 19:44:01 -07:00
Richard Zou
26b4fa45be Add ability to use CUDAGraphs with use_inductor=False (#17345)
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-29 10:16:52 +08:00
Harry Mellor
6dbe5b5c93 Remove checks for None for fields which should never be None (#17985)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-28 21:32:19 +00:00
Harry Mellor
4c2b38ce9e Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-28 12:46:04 +00:00
wang.yuqi
de65fc8e1e [CI] improve embed testing (#18747) 2025-05-28 00:16:35 -07:00
Cyrus Leung
0c492b7824 [Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-28 15:09:04 +08:00
wang.yuqi
3e9ce609bd [Bugfix] Fix nomic max_model_len (#18755) 2025-05-27 20:29:53 -07:00
Hyogeun Oh (오효근)
a68e293cb9 [Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-27 01:44:20 -07:00
Cyrus Leung
61a45e7a72 [Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 01:44:04 -07:00
Feng XiaoLong
4fc1bf813a [Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
2025-05-23 16:16:26 -07:00
Cyrus Leung
7d9216495c [Doc] Update references to doc files (#18637)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 15:49:21 -07:00
Jiayi Yao
2628a69e35 [V1] Support Deepseek MTP (#18435)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
2025-05-23 10:26:28 -07:00
Cyrus Leung
273cb3b4d9 [Doc] Fix top-level API links/docs (#18621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-23 09:46:56 -07:00
cascade
71ea614d4a [Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-23 01:03:34 -07:00
aws-elaineyz
ed5d408255 [Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-22 21:26:32 -07:00