Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
iAmir97
|
1d165d6d85
|
[Chore] Separate out vllm.utils.mem_utils (#27143)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-18 10:06:59 +00:00 |
|
Hanchenli
|
7c572544e4
|
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515)
Signed-off-by: Hanchenli <lihanc2002@gmail.com>
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com>
Signed-off-by: Wei Wei <wwei6@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
Co-authored-by: Wei Wei <weiweinpu@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-17 21:55:54 -07:00 |
|
Michael Goin
|
950cf9e58e
|
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 (#27114)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-17 19:47:18 +00:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Harry Mellor
|
fb5e10d3fb
|
Refactor Transformers backend to use mixins (#26906)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 21:50:39 +00:00 |
|
Bram Wasti
|
b2f78cbad4
|
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-16 21:40:25 +00:00 |
|
Mandy Li
|
ac3ed5a815
|
Support block size of 256 used by Intel HPU (#26883)
Signed-off-by: mandy-li <mandy.j.li@intel.com>
|
2025-10-16 15:10:57 -04:00 |
|
Bram Wasti
|
7d8975de84
|
Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-15 22:06:02 -07:00 |
|
ahao-anyscale
|
582f2c6be7
|
[BUG] Allow runai_streamer_sharded in config check (#26958)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-10-15 20:05:14 -07:00 |
|
Sam/Samuel
|
14f8456344
|
[Feature]: Use pydantic validation in observability.py config (#26637)
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com>
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-15 16:44:03 +00:00 |
|
wangxiyuan
|
8f4b313c37
|
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 12:11:48 +00:00 |
|
wangxiyuan
|
db1764e4e0
|
[Platform] allow platform to init dp group (#22243)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 02:32:17 -07:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Boyuan Feng
|
ca683a2a72
|
use combo kernel to fuse qk-norm and qk-rope (#26682)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-14 09:40:59 -04:00 |
|
Jaya Yuan
|
ea97940d6c
|
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864)
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>
|
2025-10-14 13:07:50 +00:00 |
|
Vladislav Bronzov
|
c715ba3735
|
[Feature] Change vllm.py with pydantic validation (#26726)
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com>
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-14 12:00:54 +00:00 |
|
Cyrus Leung
|
9c4cb68339
|
[Chore] Remove SupportsV0Only interface and update supported models docs (#26783)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 04:55:10 -07:00 |
|
Chendi.Xue
|
7e6edb1469
|
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-14 09:46:05 +00:00 |
|
Ryan Li
|
481545b397
|
scheduler.py: Update the name of the default scheduler. (#26758)
Signed-off-by: Ryan Li <ryanli@ryanli.org>
|
2025-10-14 06:52:21 +00:00 |
|
Michael Goin
|
3e051bda82
|
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-13 18:12:52 -07:00 |
|
Morrison Turnansky
|
e3fdb627d9
|
[FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-13 22:47:16 +00:00 |
|
Wentao Ye
|
314285d4f2
|
[CI] Fix mypy for vllm/distributed (#26593)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 16:02:24 -04:00 |
|
Anand Roy
|
10214b6935
|
[FEATURE]: Use pydantic validation in multimodal.py config (#26629)
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 07:56:59 -07:00 |
|
wang.yuqi
|
767c3ab869
|
[Model][0/N] Improve all pooling task | clean up (#25817)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-13 16:44:50 +08:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Jaya Yuan
|
b91d8db873
|
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574)
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
|
2025-10-12 09:58:38 +00:00 |
|
wang.yuqi
|
76852017ea
|
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-12 09:29:08 +00:00 |
|
Angela Yi
|
01653a917b
|
[compile] Fix inductor partition config (#26645)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-11 21:03:14 +00:00 |
|
baonudesifeizhai
|
cddce79fda
|
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-10 16:35:28 +00:00 |
|
Ashwin Phadke
|
ab196edefb
|
Remove LoRA bias support (#25807)
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-10 09:50:33 +00:00 |
|
Lucas Wilkinson
|
29255cfc3b
|
[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-10-10 01:20:31 -04:00 |
|
Harry Mellor
|
e246ad6f0c
|
Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-09 06:02:40 -07:00 |
|
Jiangyun Zhu
|
5728da11ea
|
Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-09 05:43:55 -07:00 |
|
Simon Danielsson
|
92be3f3517
|
[Feature] Use pydantic validation in parallel.py config (#26417)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-09 12:41:31 +00:00 |
|
Simon Danielsson
|
e4791438ed
|
[Feature] Use pydantic validation in lora.py and load.py configs (#26413)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-10-09 02:38:33 -07:00 |
|
Jee Jee Li
|
1b2c440cd6
|
[Core] Relax the LoRA max rank (#26461)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-08 23:47:14 -07:00 |
|
Naveenraj Kamalakannan
|
e614ab7806
|
Separate MLAAttention class from Attention (#25103)
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-08 17:11:11 -07:00 |
|
Morrison Turnansky
|
e1ba235668
|
[BugFix] Fix failing test quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled (#26436)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
|
2025-10-08 20:04:12 +00:00 |
|
Vinay R Damodaran
|
b25d7b5657
|
[Feature] Change cache.py with pydantic validation (#26390)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-08 11:12:59 -07:00 |
|
Harry Mellor
|
e09d1753ec
|
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-08 10:40:42 -07:00 |
|
Harry Mellor
|
2f99f2f506
|
Tidy vllm/config/__init__.py to only add classes and functions (#26405)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-08 07:10:00 -07:00 |
|
Ayush Satyam
|
5e65d6b2ad
|
fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
|
2025-10-07 22:55:08 -07:00 |
|
liangel-02
|
b32260ab85
|
[torchao] safetensors integration (#25969)
Signed-off-by: Angel Li <liangel@meta.com>
|
2025-10-07 20:12:35 -06:00 |
|
Morrison Turnansky
|
0c824fc46f
|
[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26113)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-07 12:53:43 -07:00 |
|
Grant Holmes (Ren)
|
d100d78eb3
|
Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
|
2025-10-07 09:20:30 +00:00 |
|
Michael Goin
|
c6873c4e6d
|
[UX] Support nested dicts in hf_overrides (#25727)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 11:19:16 +08:00 |
|
Sage Moore
|
2111b4643c
|
[Core] Simplify the Dp padding/should ubatch coordination logic (#25768)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 01:57:49 +00:00 |
|