biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
elvischenv	296839a1b0	[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-03-18 15:01:26 +00:00
Terry Gao	3e6a1e1686	[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-16 18:51:46 -04:00
Rohan Potdar	a4ad9db541	Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) (#35786 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-13 07:33:22 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Luka Govedič	9556af87d5	[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>	2026-03-11 10:56:55 -07:00
Richard Zou	822e250ab7	[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 16:07:09 +00:00
Richard Zou	09b6f99852	[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 03:12:03 -07:00
Jiangyun Zhu	ca5fb4bbd8	[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-10 07:39:01 -07:00
Copilot	4b87ffbefb	[torch.compile] Rename `compile_ranges_split_points` to `compile_ranges_endpoints` (#36027 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-09 18:04:40 +00:00
Jiangyun Zhu	e5ff140216	[cudagraph] fix cudagraph warning in deepseekv32 (#28044 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-08 20:27:41 -04:00
Zhengxu Chen	a97954b6a8	[compile] Consistent compiler config for saved/loaded vllm backends. (#35810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 15:08:12 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Zhengxu Chen	dd6dbd93f8	[compile] Fix extra cache save on warm start. (#35921 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 12:56:30 +08:00
Richard Zou	5569f5218d	[torch.compile] Stop lazily compiling (#35472 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-04 12:13:17 -08:00
Stefano Castagnetta	d7166e74c1	[CI] Add Blackwell AsyncTP correctness test (#35871 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>	2026-03-04 19:41:21 +00:00
Bhuminjay Soni	fb3e78ab09	[Feature][CI]: compare `func` & `no_func` outputs in test_functionalization.py (#35481 ) Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com> Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-04 18:01:16 +00:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
TJian	fb7fdc49c4	[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-03 06:24:21 -08:00
Richard Zou	d1a6e96d9e	[torch.compile] Improve cold and warm start compile tests (#35709 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-02 19:27:06 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Zhengxu Chen	29b35477b0	[compile] Fix caching error over pytree slice node. (#35308 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-27 19:34:16 +00:00
Jason Li	9d37941017	[torch.compile] Sequence Parallelism threshold compile ranges (#28672 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-26 05:00:12 +00:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Michael Goin	a4bd661fb3	[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 17:34:41 -08:00
ElizaWszola	a88b3be7c4	[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-17 23:35:04 -08:00
Richard Zou	7967e854da	[BugFix] Fix sp tests (#34716 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-17 17:07:56 +00:00
Rohan Potdar	fd618871b4	[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-11 11:12:05 -05:00
Richard Zou	e30cedd44b	[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 19:15:40 -08:00
Charlie Fu	bb9f97308d	[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945 ) Signed-off-by: charlifu <charlifu@amd.com>	2026-02-09 16:15:43 -05:00
Mohammad Miadh Angkad	d4f123cc48	[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>	2026-02-09 15:43:24 +00:00
Andrey Talman	f97ca67176	[Release 2.10] Update to Torch 2.10 - final release (#30525 )	2026-02-08 13:51:09 -08:00
Richard Zou	81fe69cae5	[torch.compile] Stop compiling identical artifacts (#34003 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-07 05:24:48 -08:00
Ikenna	906077181b	[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967 ) Signed-off-by: Ikenna <ikennachifo@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 02:27:33 +00:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
Luka Govedič	4d9513537d	[CI][torch.compile] Reduce e2e fusion test time (#33293 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-04 19:09:03 -05:00
Richard Zou	9f14c9224d	Revert "[torch.compile] Significantly speed up cold start times" (#33820 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-04 21:59:59 +00:00
Simon Danielsson	4292c90a2a	[Bugfix] Support `RotaryEmbedding` CustomOp for gpt-oss (#33800 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2026-02-04 20:17:41 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
Luka Govedič	15f40b20aa	[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com>	2026-01-31 06:48:34 -08:00
Angela Yi	608b556507	[ez] Add structured torch.compile logs (#33213 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-31 21:00:54 +08:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Luka Govedič	bbbd696af9	[torch.compile][CI] Add back attn fusion on hopper/ada (#32940 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 16:49:20 +00:00
Xin Yang	90c2007932	[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-23 14:34:30 +00:00

1 2 3 4 5 ...

281 Commits