Lucia Fang
|
b316ac6589
|
[V1] Support MP Executor for multi node distributed inference (#23691)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-16 09:01:21 +00:00 |
|
Nick Hill
|
bc3e43069a
|
[BugFix] Fix multi-modal async scheduling race condition (#28706)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 01:11:13 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
4ca5cd5740
|
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-11-12 15:24:12 -08:00 |
|
Ilya Markov
|
1788aa1efb
|
[BugFix] Graceful handling of torch symm mem errors. (#27671)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-11 17:41:54 -07:00 |
|
Boyuan Feng
|
0f872b7977
|
[Log] update shm wait time msg (#28255)
|
2025-11-07 09:43:30 -05:00 |
|
bnellnm
|
938772af03
|
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123)
|
2025-11-04 21:59:45 +08:00 |
|
Wentao Ye
|
a8141fa649
|
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK (#27750)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-30 15:32:39 -04:00 |
|
Kero Liang
|
02af36df36
|
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer (#27117)
Signed-off-by: Kero Liang <kerorek@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: donglu <donglu@cohere.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-28 15:01:24 +00:00 |
|
Wentao Ye
|
52efc34ebf
|
[Log] Optimize Startup Log (#26740)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-24 19:27:04 -04:00 |
|
Pengchao Wang
|
d95d0f4b98
|
[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-10-24 14:16:44 -07:00 |
|
Rui Qiao
|
09a6a49eaf
|
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator (#27443)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-10-24 14:53:09 +08:00 |
|
dongbo910220
|
a0003b56b0
|
[Chore] Separate out system utilities from vllm.utils (#27201)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 20:25:25 +00:00 |
|
dongbo910220
|
3ae082c373
|
[Chore] Separate out optional dependency checks from vllm.utils (#27207)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 10:44:21 -04:00 |
|
Nick Hill
|
647214f3d5
|
[V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-21 11:09:37 -07:00 |
|
dongbo910220
|
6c728f7771
|
[Chore] Separate out NCCL utilities from vllm.utils (#27197)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-10-21 06:18:23 -07:00 |
|
iAmir97
|
7a6c8c3fa1
|
[Chore] Separate out vllm.utils.network_utils (#27164)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
|
2025-10-19 03:06:32 -07:00 |
|
Nick Hill
|
3b45075206
|
[Minor] Add some clarifying comments to recent changes (#27130)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-18 09:52:45 -07:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
Nick Hill
|
ab81379ea6
|
[Perf] Exploit out-of-band buffers in shm_broadcast (#26961)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-16 20:08:03 -07:00 |
|
Bram Wasti
|
b2f78cbad4
|
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-16 21:40:25 +00:00 |
|
Bram Wasti
|
7d8975de84
|
Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-15 22:06:02 -07:00 |
|
Michael Goin
|
3e051bda82
|
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-13 18:12:52 -07:00 |
|
Wentao Ye
|
314285d4f2
|
[CI] Fix mypy for vllm/distributed (#26593)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 16:02:24 -04:00 |
|
Wentao Ye
|
e251e457c5
|
[Log] Optimize Startup Log (#26601)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-14 02:06:57 +08:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Wentao Ye
|
8983e0216f
|
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" (#26448)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-09 15:16:48 -07:00 |
|
Utkarsh Sharma
|
335b28f7d1
|
[TPU] Rename tpu_commons to tpu_inference (#26279)
Signed-off-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-10-07 23:30:52 -07:00 |
|
Harry Mellor
|
6c04638214
|
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 05:12:40 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Lucia Fang
|
f48b6a03ba
|
[Misc]allow disable pynccl (#25421)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-01 06:04:13 +00:00 |
|
Gregory Shtrasberg
|
61a3431613
|
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so (#25605)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-29 17:01:50 -04:00 |
|
Tyler Michael Smith
|
a5354b3ed2
|
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 14:22:28 +00:00 |
|
Nick Hill
|
983056e456
|
[Misc] Remove unnecessary memoryviews in shm_broadcast.py (#25721)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-26 03:11:44 +00:00 |
|
Nick Hill
|
8b77328ffe
|
[Misc] Don't log shm dequeue delay warning on worker side (#25720)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-26 01:08:30 +00:00 |
|
youkaichao
|
6c340da4df
|
[misc] log info messages by default for hanging / busy / idle (#25627)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-25 21:14:57 +08:00 |
|
Cyrus Leung
|
2f17117606
|
[mypy] Fix wrong type annotations related to tuple (#25660)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 13:00:45 +00:00 |
|
Shu Wang
|
54e42b72db
|
Support mnnvl all2allv from Flashinfer (#21003)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-24 14:38:16 -04:00 |
|
youkaichao
|
b67dece2d8
|
[misc] update the warning message (#25566)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-24 17:24:35 +08:00 |
|
Michael Goin
|
7361ab379f
|
Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 22:48:40 +00:00 |
|
Ilya Markov
|
8bdd8b5c51
|
Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 15:53:00 -04:00 |
|
Amir Samani
|
8c1c81a3de
|
[core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 14:33:06 -04:00 |
|
Lucas Wilkinson
|
cc1dc7ed6d
|
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-23 16:02:10 +00:00 |
|
Fanli Lin
|
4c966e440e
|
[XPU] Fix MOE DP accuracy issue on XPU (#25465)
|
2025-09-23 14:32:57 +00:00 |
|
Shu Wang
|
2ea50e977a
|
Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-18 15:52:58 +00:00 |
|
Hyogeun Oh (오효근)
|
b419937c78
|
[Docs] Fix warnings in mkdocs build (continued) (#25163)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-09-18 08:23:26 -07:00 |
|
Punitvara
|
05b044e698
|
[Doc] Fix cross-reference warnings (#25058)
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 02:05:16 -07:00 |
|
Sage Moore
|
567939953b
|
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-16 12:21:48 -04:00 |
|
dongluw
|
a5b84f1cbf
|
[Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-09-12 07:54:17 -07:00 |
|
Ilya Markov
|
1fdd5c42d7
|
[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 09:45:31 -07:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|