biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Richard Zou	d1a6e96d9e	[torch.compile] Improve cold and warm start compile tests (#35709 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-02 19:27:06 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Zhengxu Chen	29b35477b0	[compile] Fix caching error over pytree slice node. (#35308 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-27 19:34:16 +00:00
Jason Li	9d37941017	[torch.compile] Sequence Parallelism threshold compile ranges (#28672 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-26 05:00:12 +00:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Michael Goin	a4bd661fb3	[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 17:34:41 -08:00
ElizaWszola	a88b3be7c4	[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-17 23:35:04 -08:00
Richard Zou	7967e854da	[BugFix] Fix sp tests (#34716 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-17 17:07:56 +00:00
Rohan Potdar	fd618871b4	[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-11 11:12:05 -05:00
Richard Zou	e30cedd44b	[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 19:15:40 -08:00
Charlie Fu	bb9f97308d	[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945 ) Signed-off-by: charlifu <charlifu@amd.com>	2026-02-09 16:15:43 -05:00
Mohammad Miadh Angkad	d4f123cc48	[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>	2026-02-09 15:43:24 +00:00
Andrey Talman	f97ca67176	[Release 2.10] Update to Torch 2.10 - final release (#30525 )	2026-02-08 13:51:09 -08:00
Richard Zou	81fe69cae5	[torch.compile] Stop compiling identical artifacts (#34003 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-07 05:24:48 -08:00
Ikenna	906077181b	[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967 ) Signed-off-by: Ikenna <ikennachifo@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 02:27:33 +00:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
Luka Govedič	4d9513537d	[CI][torch.compile] Reduce e2e fusion test time (#33293 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-04 19:09:03 -05:00
Richard Zou	9f14c9224d	Revert "[torch.compile] Significantly speed up cold start times" (#33820 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-04 21:59:59 +00:00
Simon Danielsson	4292c90a2a	[Bugfix] Support `RotaryEmbedding` CustomOp for gpt-oss (#33800 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2026-02-04 20:17:41 +00:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
Luka Govedič	15f40b20aa	[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com>	2026-01-31 06:48:34 -08:00
Angela Yi	608b556507	[ez] Add structured torch.compile logs (#33213 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-31 21:00:54 +08:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Luka Govedič	bbbd696af9	[torch.compile][CI] Add back attn fusion on hopper/ada (#32940 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 16:49:20 +00:00
Xin Yang	90c2007932	[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-23 14:34:30 +00:00
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
Lucas Kabela	15e302dfce	[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-22 15:12:26 +00:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Lucas Wilkinson	14ce524249	[CI] Breakup h200 tests (#30499 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 06:23:22 +00:00
dolpm	8471b27df9	[compile] raise on compile_size implicit padding (#32343 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-14 20:46:56 +00:00
Lucas Kabela	ea6d067a2a	[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-09 22:01:38 -05:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
DevByteAI	1f214290d6	fix(compile): apply partition wrapper when loading AOT cached functions (#31536 ) Signed-off-by: Devbyteai <abud6673@gmail.com> Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 17:27:26 +08:00
Angela Yi	9a1d20a89c	[CI] Add warmup run in test_fusion_attn (#31183 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-07 00:31:52 +00:00
Charlie Fu	c07163663d	[ROCm][CI] Fix tests/compile unit tests (#28895 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-06 18:50:43 +00:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Boyuan Feng	2f12cd32c0	[BugFix] Fix cache issue in compilation_config (#31376 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-27 09:30:39 -05:00
baonudesifeizhai	8711b21676	Fix/get raw stream patch #30905 (#30912 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-26 20:08:47 -08:00
vllmellm	f32cfd7d97	[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-23 02:07:54 -08:00
Cyrus Leung	8cef137689	[Chore] Update more locations to use `attention_config.backend` (#31153 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-22 19:19:50 -08:00

1 2 3 4 5 ...

261 Commits