biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Matthew Bonanni	116f4be405	[1/N][Cleanup] Standardize on use of `is_quantized_kv_cache` (#38659 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-04-01 04:08:01 +00:00
BadrBasowid	077a9a8e37	[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-31 14:15:50 -04:00
Itay Alroy	c57d38d603	elastic_ep: Fix issues with repeated scale up/down cycles (#37131 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-20 23:13:02 +00:00
Peter Pan	79eb9369c5	fix CUDAGraph memory being counted twice (#37426 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-20 17:36:32 +00:00
JartX	e8f9dbc369	[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-03-17 17:55:34 -04:00
Benjamin Chislett	f63ed7b5ac	[Bugfix] Fix DP MTP Dummy Run (#35243 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-17 11:16:48 -04:00
Kunshang Ji	747b068136	[Hardware] Replace memory related torch.cuda APIs (#37031 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-03-16 10:24:48 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Aaron Hao	d6b61e5166	[BUG] Fix async rlhf tests (#35811 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-11 18:06:10 -04:00
Hongxin Xu	bea02cdf93	Fix routed experts capture for hybrid models (Mamba + Attention) (#35744 ) Signed-off-by: arlenxu <arlenxu@tencent.com> Signed-off-by: xhx1022 <1737006628@qq.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-03-11 08:53:10 -07:00
Hongbin Guo	4bf533623b	[Doc] Fix duplicate words in comments (#36713 ) Signed-off-by: Hongbin10 <jdmjdm1998@163.com>	2026-03-10 21:28:31 -07:00
Nick Hill	65b2f405dc	[Core] Simplify core kv-cache blocks initialization logic (#36521 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 20:20:02 +00:00
Vadim Gimpelson	4ff8c3c8f9	[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-10 03:32:20 -07:00
Matthew Bonanni	ebb9cc5f2b	[UX][Startup] Account for CUDA graphs during memory profiling (#30515 )	2026-03-07 13:49:23 -08:00
Nick Hill	b354686524	[Model Runner V2] Fix warmup for pipeline parallel (#36280 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-06 16:58:51 -08:00
Shiyan Deng	03a49bb8f0	[Feature] Add --distributed-timeout-seconds CLI option (#36047 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-03-05 20:57:51 -08:00
Nick Hill	a73af584fe	[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-05 14:48:10 -08:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Nick Hill	d15c3b90fc	[Core] Move save_tensorized_model logic to Worker (#35825 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-03 15:31:59 -08:00
Martin Hickey	87c98b0236	[MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-02 13:23:42 +00:00
Huy Do	7b346ba8ed	[Bugfix] Propagate compilation_time from workers to main process for TP>1 (#35503 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-02-28 05:03:22 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Woosuk Kwon	86ac7bcf84	[Model Runner V2] Support pooling models (#35120 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-27 18:03:01 -08:00
Nick Hill	b1d9f5372d	[Model Runner V2] Warmup kernels (#35172 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-27 10:43:30 -08:00
Nick Hill	876312f0b5	[Core] Fix `gpu_worker.py` pre-commit errors (#35312 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-27 07:54:24 -08:00
Wentao Ye	3d2a026fd0	[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-02-13 16:38:16 +08:00
Jaewon	4453ba8d9e	[Core] Profiler improvements and lazy initialization (#33198 ) Signed-off-by: Jaewon Lee <jaewon@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-12 16:16:38 -08:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
Ilya Markov	bb2fc8b5e7	[BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-02-10 22:34:47 +00:00
J Seppänen	506ad7d7c1	[Bugfix] Fix weights offloading for sleep mode (#32947 ) Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-02-10 20:38:17 +00:00
Wentao Ye	67a746e87f	[Log] Optimize duplicate startup log (#33944 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 17:49:56 +00:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Aaron Hao	c1858b7ec8	[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-05 12:13:23 -05:00
kourosh hakhamaneshi	2f6d17cb2f	[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2026-02-04 10:09:14 -08:00
jma99_2333	22d9a056d5	Support clear mm and encoder cache (#33452 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-31 15:22:25 +00:00
Kyle Sayers	f857a03f6b	[QeRL] Layerwise Reloading (#32133 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-01-30 08:50:05 -07:00
Chendi.Xue	8c8ebeb941	[BUGFIX][XPU] fix memory check after XPU reuse GPU_worker (#33358 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2026-01-29 09:56:30 -08:00
Nick Hill	6bf3b46d78	[ModelRunner V2] Misc code simplification and cleanup (#33266 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-28 14:41:23 -08:00
Reagan Lee	06b557ecd9	feat(benchmark): add encoder forward pass benchmarking to mm-processor (#31655 ) Signed-off-by: Reagan <reaganjlee@gmail.com> Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com> Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>	2026-01-24 08:24:44 +00:00
Nick Hill	8518b30447	[Model Runner V2] Add KV Connector support (#32742 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 10:49:17 -08:00
Woosuk Kwon	43fada5360	[Model Runner V2] Refactor `dummy_run` (#32533 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-19 14:50:59 -08:00
Shanshan Shen	ce0946249d	[Misc] Make mem utils can be reused by other platforms (#32322 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-14 03:46:01 -08:00
Max Hu	6ebe34d6fa	[Feature] Add iteration level logging and enhance nvtx marker (#31193 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2026-01-09 00:13:39 +00:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Nick Hill	a3d909ad2b	[Misc] Tidy up some spec decode logic in GPUModelRunner (#31591 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-08 09:10:07 -08:00
Ning Xie	c907d22158	[refactor] refactor memory constants usage (#31865 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-07 18:37:31 +00:00
Cyrus Leung	aafd4d2354	[Chore] Try remove `init_cached_hf_modules` (#31786 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 12:34:04 +08:00
Ning Xie	6f5e653383	[Log] add log about gpu worker init snapshot and requested memory (#29493 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-06 17:32:55 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00

1 2 3 4

199 Commits