biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Ikenna	906077181b	[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967 ) Signed-off-by: Ikenna <ikennachifo@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 02:27:33 +00:00
Wentao Ye	67a746e87f	[Log] Optimize duplicate startup log (#33944 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 17:49:56 +00:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Richard Zou	9f14c9224d	Revert "[torch.compile] Significantly speed up cold start times" (#33820 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-04 21:59:59 +00:00
Zhengxu Chen	bcd2f74c0d	[compile] Clean up AOT compile bypass on evaluate_guards. (#33578 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-04 02:12:53 -08:00
Wentao Ye	5e1e0a0fbd	[Refactor] Remove unused dead code (#33718 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-03 21:25:11 -08:00
Richard Zou	b1bb18de8d	[torch.compile] Significantly speed up cold start times (#33641 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 09:12:11 -08:00
Richard Zou	fd9c83d0e0	[torch.compile] Document the workaround to standalone_compile failing (#33571 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 07:16:55 +00:00
Luka Govedič	15f40b20aa	[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com>	2026-01-31 06:48:34 -08:00
Angela Yi	608b556507	[ez] Add structured torch.compile logs (#33213 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-31 21:00:54 +08:00
Angela Yi	5a66c9cc76	[ez] Delete torch25_custom_graph_pass (#33287 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-29 16:47:05 +00:00
Angela Yi	07ea184f00	[ez] Delete more torch version checks <= 2.8 (#33288 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-29 05:28:46 +00:00
Angela Yi	4197168ea5	[ez] Remove checks for torch version <= 2.8 (#33209 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 16:03:56 -05:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Harry Mellor	f1acbd68c5	[CI] Enable mypy import following for `vllm/compilation` (#33199 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 08:59:54 +00:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Roberto L. Castro	fcb9df99bd	[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-24 18:45:27 -07:00
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
Lucas Kabela	15e302dfce	[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-22 15:12:26 +00:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Nicolò Lucchesi	74c583bc50	[Core] Whisper support `torch.compile` (#30385 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-19 10:02:31 +00:00
Richard Zou	bd292be0c0	[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-15 20:01:41 +00:00
Angela Yi	7933638051	[misc] Remove is_torch_equal_or_newer(2.4) cases (#32296 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-13 23:22:07 -08:00
cjackal	15b33ff064	[Misc] improve warning/assert messages (#32226 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-01-13 03:11:23 +00:00
Lucas Kabela	ad8818bb5e	[Misc][BE] Type coverage for vllm/compilation [3/3] (#31748 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-12 19:24:38 +00:00
Laith Sakka	46eb30f519	make assume_32_bit_indexing configurable (#32044 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2026-01-10 23:15:46 -08:00
Kevin McKay	4dc0d606b7	[Bugfix] Narrow broad exceptions in compilation backends (#31616 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-09 21:39:22 -05:00
Lucas Kabela	aaf4b70aae	[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744 )	2026-01-09 18:30:38 -05:00
DevByteAI	1f214290d6	fix(compile): apply partition wrapper when loading AOT cached functions (#31536 ) Signed-off-by: Devbyteai <abud6673@gmail.com> Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 17:27:26 +08:00
Lucas Kabela	873480d133	[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-06 20:37:51 -05:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Amir Samani	030fc44914	use the same stream for cuda graph catpure and replay for NCCL (#29207 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 19:10:03 +08:00
vllmellm	f32cfd7d97	[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-23 02:07:54 -08:00
Lucas Kabela	0db5439ded	[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 08:23:31 -08:00
Zhengxu Chen	53cd7f868b	[compile] Recompile graph module during Dynamo cache loading. (#30743 ) Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>	2025-12-17 02:00:12 -08:00
Zhengxu Chen	177c391db2	[compile] Disable aot when eager backend is used. (#30810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 01:55:56 -08:00
ElizaWszola	994acec0cc	[Bugfix] Fix fusion for VL models (#30244 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-12-14 21:22:37 +08:00
Ilya Markov	3224ea9915	[torch.compile] Add encoder tag for compilation (#30489 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-14 18:15:11 +08:00
Laith Sakka	f569c654e1	enable unbacked with aot_compile (#30462 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-14 08:14:06 +00:00
Laith Sakka	763963aa73	set assume_32bit_indexing and pass unbacked hints (#30459 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-13 15:36:53 +00:00
Zhengxu Chen	fe1787107e	[compile] Parse compile range cache keys as Range during cache loading. (#30516 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-12 04:30:51 +00:00
Zhengxu Chen	92fea56fd1	[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-11 20:28:03 +00:00
Charlie Fu	3c680f4a17	[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-12-09 22:39:26 +00:00
Ilya Markov	0b6a8a304c	[BugFix] Fix non detected failing tests (#30277 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-09 17:57:55 +00:00
Laith Sakka	87aee9ed2b	Add evaluate_guards option to DynamicShapesConfig (#27432 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-08 10:46:15 -05:00
Ye (Charlotte) Qi	eb1051fb95	[ROCm] Guard group quant RMS norm fusion patterns (#30239 )	2025-12-08 14:44:48 +00:00

1 2 3 4 5 ...

264 Commits