biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lalithnarayan C	7acaea634c	In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970 ) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 23:35:35 +00:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Xinan Miao	2cdf92228c	[Feature]: Remove Chunking From FusedMoE (#34086 ) Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>	2026-03-12 14:24:38 -04:00
caozuoba	9e19f8338b	[Perf] add packed recurrent fast path for decode (#36596 ) Signed-off-by: hdj <1293066020@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-12 04:01:57 -07:00
Matthew Bonanni	ebb9cc5f2b	[UX][Startup] Account for CUDA graphs during memory profiling (#30515 )	2026-03-07 13:49:23 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Joe Runde	6f0dd93801	[Core] Remove busy loop from idle buffer readers (#28053 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-04 07:44:20 +00:00
Rohan Potdar	3a8eef5869	[ROCm][Bugfix]: Disable AITER Triton ROPE by default (#35601 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-03 13:43:56 -06:00
lin-shh	8fa68a8ce4	Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645 )	2026-03-02 21:59:43 -08:00
Hanjie Qiu	96fc09503a	[All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793 ) Signed-off-by: hjjq <hanjieq@nvidia.com>	2026-03-02 18:57:38 -05:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Angela Yi	c29ee9c326	[compile] Invalidate cache for cpu flags (#35119 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-02-27 02:54:11 +00:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Laura Wang	2465071510	[Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828 ) Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-24 23:01:53 -08:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
Michael Goin	a4bd661fb3	[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 17:34:41 -08:00
Andreas Karatzas	991d6bff38	[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-20 20:03:32 -08:00
kourosh hakhamaneshi	c464b57374	[Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-17 01:08:42 -08:00
Seiji Eicher	79c7e09235	[KV Connector] Add temporary, off-by-default `VLLM_DISABLE_REQUEST_ID_RANDOMIZATION` workaround (#34415 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-02-14 23:26:10 -08:00
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
Richard Zou	341eed3d30	[torch.compile] Disable recursive pre_grad_passes (#34092 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 18:02:31 -05:00
Gregory Shtrasberg	f0ca0671c7	[Feature] Warn about unrecognized environment variables (#33581 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-10 15:45:38 -06:00
Zhengxu Chen	82e11973cc	[compile] Enable AOT compile with 2.10 in trunk. (#34155 ) Signed-off-by: Zhengxu Chen <zhxchen17@meta.com>	2026-02-10 23:24:42 +08:00
Andrey Talman	f97ca67176	[Release 2.10] Update to Torch 2.10 - final release (#30525 )	2026-02-08 13:51:09 -08:00
Wentao Ye	d88a1df699	[Deprecation] Deprecate profiling envs (#33722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-04 05:58:21 +00:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
杨朱 · Kiki	ef248ff740	[Misc] Remove deprecated profiler environment variables (#33536 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:58:44 +08:00
Pavani Majety	c3a9752b0c	[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-01-30 10:30:46 -08:00
Harry Mellor	fb946a7f89	Make `mypy` opt-out instead of opt-in (#33205 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-29 09:12:26 +00:00
Roger Wang	b539f988e1	[Models] Kimi-K2.5 (#33131 ) Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-27 14:50:31 +08:00
dolpm	58a05b0ca1	[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-26 16:59:44 -05:00
Alex Brooks	9ac818a551	[Misc] HF Hub LoRA Resolver (#20320 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-26 13:56:32 +00:00
Jee Jee Li	73b243463b	[BugFix] Add env variable to control PDL in LoRA (#32836 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-25 16:32:30 +08:00
dolpm	0118cdcc02	[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-23 22:53:10 +00:00
Xin Yang	90c2007932	[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-23 14:34:30 +00:00
Nick Hill	7fe255889e	[Misc] Log vLLM logo when starting server (#32796 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 11:15:12 +08:00
Isotr0py	8ebf271bb6	[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 16:37:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Walter Beller-Morales	8be263c3fb	[Core] Cleanup shm based object store on engine shutdown (#32429 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-01-20 08:53:37 +00:00
Karan Bansal	3055232ba0	[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-18 11:02:01 +08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Aleksandr Malyshev	8c11001ba2	[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-01-15 14:13:08 -06:00
Pleaplusone	130d6c9514	[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for `AiterFlashAttentionBackend` (#29887 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-15 15:29:53 +00:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00
Fadi Arafeh	9103ed1696	[CPU][BugFix] Disable AOT Compile for CPU (#32037 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-10 23:15:49 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Michael Goin	d5ec6c056f	[UX] Add vLLM model inspection view (#29450 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-09 10:12:35 -07:00

1 2 3 4 5 ...

411 Commits