Ning Xie
|
2bb246b8f7
|
[MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-22 13:39:09 +08:00 |
|
wangxiyuan
|
e773a9e1c2
|
[Misc] Clean up useless code (#19889)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-06-20 21:09:09 +00:00 |
|
Maximilien de Bayser
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
Ning Xie
|
6e9cc73f67
|
[MISC] correct DeviceConfig device field static type analysis (#19699)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-17 17:21:50 -07:00 |
|
Ning Xie
|
26bc46ef89
|
[MISC] typo fix (#19672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-16 07:18:49 +00:00 |
|
Ye (Charlotte) Qi
|
b692e9cd07
|
[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-16 06:30:29 +00:00 |
|
Woosuk Kwon
|
055915e6ce
|
Enable prefix caching with full cuda graphs (#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-15 01:05:05 -07:00 |
|
Woosuk Kwon
|
aafbbd981f
|
[torch.compile] Use custom ops when use_inductor=False (#19618)
|
2025-06-13 15:05:54 -07:00 |
|
youkaichao
|
d70bc7c029
|
[torch.compile] reorganize the cache directory to support compiling multiple models (#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-13 15:23:25 +08:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
rasmith
|
c7ea0b56cd
|
[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-06-11 15:53:28 -04:00 |
|
Richard Zou
|
77f0d465d0
|
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-11 07:54:41 +08:00 |
|
Siyuan Liu
|
3a7cd627a8
|
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-09 16:41:51 -07:00 |
|
wang.yuqi
|
2ffb9b6e07
|
[Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201)
|
2025-06-08 07:17:53 -07:00 |
|
Richard Zou
|
eaa2e51088
|
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-08 08:56:12 +08:00 |
|
Richard Zou
|
da511d54d8
|
Fix CompilationConfig repr (#19091)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-06 16:23:35 +08:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Cyrus Leung
|
01dc9a76db
|
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 04:49:20 -07:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Rui Qiao
|
bdce64f236
|
[V1] Support DP with Ray (#18779)
|
2025-06-02 21:15:13 -07:00 |
|
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
|
Gregory Shtrasberg
|
ca2f6b9c30
|
[Bugfix][Model] Attempt to fix eagle in V0. (#18978)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-02 08:15:53 -07:00 |
|
jennyyyyzhen
|
ebb1ec9318
|
[Model] enable data parallel for Llama4 vision encoder (#18368)
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
|
2025-06-02 19:22:54 +08:00 |
|
Cyrus Leung
|
6aa8f9a4e7
|
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-01 11:04:23 +08:00 |
|
Satyajith Chilappagari
|
2a50ef5760
|
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-31 10:39:11 +00:00 |
|
Yikun Jiang
|
3c49dbdd03
|
Skip device and quant Pydantic validation to make plugin device work (#18843)
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
|
2025-05-28 20:12:30 -07:00 |
|
aws-elaineyz
|
1661a9c28f
|
[Doc][Neuron] Update documentation for Neuron (#18868)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-28 19:44:01 -07:00 |
|
Richard Zou
|
26b4fa45be
|
Add ability to use CUDAGraphs with use_inductor=False (#17345)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-29 10:16:52 +08:00 |
|
Harry Mellor
|
6dbe5b5c93
|
Remove checks for None for fields which should never be None (#17985)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 21:32:19 +00:00 |
|
Harry Mellor
|
4c2b38ce9e
|
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 12:46:04 +00:00 |
|
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
|
Cyrus Leung
|
0c492b7824
|
[Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:09:04 +08:00 |
|
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
|
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
|
Cyrus Leung
|
61a45e7a72
|
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 01:44:04 -07:00 |
|
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
|
Cyrus Leung
|
7d9216495c
|
[Doc] Update references to doc files (#18637)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 15:49:21 -07:00 |
|
Jiayi Yao
|
2628a69e35
|
[V1] Support Deepseek MTP (#18435)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-05-23 10:26:28 -07:00 |
|
Cyrus Leung
|
273cb3b4d9
|
[Doc] Fix top-level API links/docs (#18621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 09:46:56 -07:00 |
|
cascade
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
aws-elaineyz
|
ed5d408255
|
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-22 21:26:32 -07:00 |
|
lkchen
|
e44d8ce8c7
|
[Bugfix] Set KVTransferConfig.engine_id in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
|
Harry Mellor
|
4b0da7b60e
|
Enable hybrid attention models for Transformers backend (#18494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 10:12:08 +08:00 |
|
wangxiyuan
|
721fb9b181
|
[Platform] Move platform check to right place (#18470)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-05-22 12:11:28 -07:00 |
|
Kebe
|
5d7f545204
|
[Frontend] deprecate --device arg (#18399)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-21 01:21:17 -07:00 |
|
Michael Goin
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
cascade
|
9ab2c02ff8
|
Support sequence parallelism combined with pipeline parallelism (#18243)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-17 22:47:25 +00:00 |
|
David Ben-David
|
3e0d435027
|
[P/D][V1] Support dynamic loading of external KV connector implementations (#18142)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-05-17 06:40:39 +00:00 |
|
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
|