biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	d88a1df699	[Deprecation] Deprecate profiling envs (#33722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-04 05:58:21 +00:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
杨朱 · Kiki	ef248ff740	[Misc] Remove deprecated profiler environment variables (#33536 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:58:44 +08:00
Pavani Majety	c3a9752b0c	[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-01-30 10:30:46 -08:00
Harry Mellor	fb946a7f89	Make `mypy` opt-out instead of opt-in (#33205 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-29 09:12:26 +00:00
Roger Wang	b539f988e1	[Models] Kimi-K2.5 (#33131 ) Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-27 14:50:31 +08:00
dolpm	58a05b0ca1	[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-26 16:59:44 -05:00
Alex Brooks	9ac818a551	[Misc] HF Hub LoRA Resolver (#20320 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-26 13:56:32 +00:00
Jee Jee Li	73b243463b	[BugFix] Add env variable to control PDL in LoRA (#32836 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-25 16:32:30 +08:00
dolpm	0118cdcc02	[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-23 22:53:10 +00:00
Xin Yang	90c2007932	[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-23 14:34:30 +00:00
Nick Hill	7fe255889e	[Misc] Log vLLM logo when starting server (#32796 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 11:15:12 +08:00
Isotr0py	8ebf271bb6	[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 16:37:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00
dolpm	7c5dedc247	[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-20 19:45:59 +00:00
Walter Beller-Morales	8be263c3fb	[Core] Cleanup shm based object store on engine shutdown (#32429 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-01-20 08:53:37 +00:00
Karan Bansal	3055232ba0	[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-18 11:02:01 +08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Aleksandr Malyshev	8c11001ba2	[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-01-15 14:13:08 -06:00
Pleaplusone	130d6c9514	[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for `AiterFlashAttentionBackend` (#29887 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-15 15:29:53 +00:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00
Fadi Arafeh	9103ed1696	[CPU][BugFix] Disable AOT Compile for CPU (#32037 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-10 23:15:49 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Michael Goin	d5ec6c056f	[UX] Add vLLM model inspection view (#29450 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-09 10:12:35 -07:00
inkcherry	4505849b30	[ROCm][PD] add moriio kv connector. (#29304 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-01-09 14:01:57 +00:00
Kate Cheng	cc6dafaef2	[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213 ) Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2026-01-07 10:53:54 -05:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
Seiji Eicher	1ab5213531	Make engine core client handshake timeout configurable (#27444 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-12-19 20:38:30 +00:00
Elizabeth Thomas	41b6f9200f	Remove all2all backend envvar (#30363 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-18 19:46:28 +00:00
SungMinCho	a0b782f9cc	[Metrics] Model FLOPs Utilization estimation (#30738 ) Signed-off-by: SungMinCho <tjdals4565@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-18 01:40:51 +00:00
Zhengxu Chen	9db1db5949	[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 01:56:24 -08:00
Lucas Wilkinson	9fec0e13d5	[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-12-16 17:10:16 -05:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
Wentao Ye	d6464f2679	[Chore] Fix torch precision warning (#30428 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-11 04:05:56 +00:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
Jialin Ouyang	9f042ba26b	[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-10 14:13:01 -05:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Wentao Ye	83319b44c2	[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-09 10:40:37 -05:00
Ming Yang	9d6235ca9a	[moe] Allow disabling DP chunking (#29936 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:29:36 +00:00
dtc	842aba501d	[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: dtc <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-12-04 09:51:36 +00:00
Shengqi Chen	1109f98288	[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-03 14:08:19 -08:00
Elizabeth Thomas	b5407869c8	[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Jane Xu <janeyx@meta.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Johnny Yang <johnnyyang@google.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>	2025-12-03 22:00:52 +00:00
Amr Mahdi	f5d3d93c40	[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-03 11:41:53 +00:00
Andrew Xia	52cb349fc0	[responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 11:24:45 -05:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Shengqi Chen	36db0a35e4	[CI] Renovation of nightly wheel build & generation (#29690 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 21:25:39 +08:00
Yifei Zhang	1ab8fc8197	Make PyTorch profiler gzip and CUDA time dump configurable (#29568 ) Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>	2025-12-01 04:30:46 +00:00
Shu Wang	f72a817bdf	[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141 ) Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-30 16:05:32 -08:00

1 2 3 4 5 ...

386 Commits