biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Richard Zou	e30cedd44b	[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 19:15:40 -08:00
Cyrus Leung	3bcd494ef4	[Redo] Add `--trust-remote-code` to dataset bench args (#34251 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 11:10:12 +08:00
tianshu-Michael-yu	0e725a7d22	[Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-02-11 11:07:51 +08:00
Lucas Wilkinson	ba0511fd80	[Misc] Add run one batch script that supports profiling (#32968 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-10 18:29:49 -08:00
Micah Williamson	4a1550d22d	[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-11 01:08:11 +00:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
7. Sun	dc6de33c3d	[CI] Add pip caching to cleanup_pr_body workflow (#32979 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-02-11 00:45:28 +00:00
Tyler Michael Smith	c4b9e6778f	[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 15:13:20 -08:00
Richard Zou	341eed3d30	[torch.compile] Disable recursive pre_grad_passes (#34092 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 18:02:31 -05:00
Zhengkai Zhang	6f2f59f2b3	[Misc][Spec Decode] support different load config for draft model (#34022 ) Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com> Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>	2026-02-10 14:52:43 -08:00
Ilya Markov	bb2fc8b5e7	[BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-02-10 22:34:47 +00:00
Ilya Markov	67132945bb	[Perf] Move eplb rebalance algo to async thread (#30888 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-02-10 22:19:10 +00:00
Gregory Shtrasberg	f0ca0671c7	[Feature] Warn about unrecognized environment variables (#33581 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-10 15:45:38 -06:00
Pavani Majety	578977bb5e	[SM100] Resubmit FMHA FP8 prefill for MLA (#31195 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-10 16:18:43 -05:00
Roger Wang	9615575afc	[Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:12:31 -08:00
Matthew Bonanni	4293c00b84	[Benchmarks] Fix attention benchmark smoke test (#34269 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-10 16:04:07 -05:00
J Seppänen	506ad7d7c1	[Bugfix] Fix weights offloading for sleep mode (#32947 ) Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-02-10 20:38:17 +00:00
Reagan Lee	fdd6f2ad58	Convert online APIs to use Renderer (#34084 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>	2026-02-10 19:44:31 +00:00
Qi Wang	33bcd3dc3b	[Misc] Introduce ec_both role EC (encoder cache) connector (#34182 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-02-10 18:55:35 +00:00
Michael Goin	1f5febb4b8	[UX nit] Fix non-default api_server_count message (#34152 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-10 10:35:58 -08:00
Andy Lo	ae871ca923	Minor cleanup for Voxtral (#34247 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-02-10 18:18:30 +00:00
Woosuk Kwon	a2443de5fa	[Model Runner V2] Use pinned memory for write_contents (#34222 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-10 08:55:22 -08:00
Harry Mellor	f84a2a8f31	[Docs] Speed up build environment set-up (#34240 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 16:34:43 +00:00
Vadim Gimpelson	000214c4bb	[BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-10 10:57:11 -05:00
junuxyz	c5a66d1697	[Core][BugFix] Fix PP KV cache sharding memory validation (#33698 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-10 10:46:24 -05:00
Roberto L. Castro	afdce12c89	[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-10 10:29:52 -05:00
Zhengxu Chen	82e11973cc	[compile] Enable AOT compile with 2.10 in trunk. (#34155 ) Signed-off-by: Zhengxu Chen <zhxchen17@meta.com>	2026-02-10 23:24:42 +08:00
xuebwang-amd	b129136c7a	[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-10 10:08:05 -05:00
mgazz	599e4335a4	Support benchmarking of Geospatial models (#33922 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>	2026-02-10 07:04:16 -08:00
Fan Yang	a1946570d8	add --insecure arg to the vllm bench to skip TLS (#34026 ) Signed-off-by: Fan Yang <yan9fan@meta.com> Co-authored-by: Fan Yang <yan9fan@meta.com>	2026-02-10 22:23:52 +08:00
Harry Mellor	d0bc520569	Bump `mamba-ssm` version in CI for Transformers v5 compatibility (#34233 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 14:46:01 +01:00
Krish Gupta	748625cdaf	[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-10 13:05:32 +00:00
Harry Mellor	61413973e8	Stop testing for slow tokenizers as they will not exist soon (#34235 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 12:08:20 +00:00
Phúc H. Lê Khắc	94de871546	[Misc] allow specify is_mm_prefix_lm in hf_config (#34215 )	2026-02-10 11:16:21 +00:00
tc-mb	e042d7e685	Add flagos in MiniCPM-o (#34126 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn> Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com> Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com>	2026-02-10 02:51:48 -08:00
Roger Wang	ae4e280602	[Bugfix] Fix FI kernel`chunk_gated_delta_rule` output shape for Qwen3.5 (#34219 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 10:41:24 +00:00
zzaebok	cbea11c9f0	[Docs] Fix format error in KV load failure recovery doc (#34137 ) Signed-off-by: Jaebok Lee <jaebok9541@naver.com>	2026-02-10 02:16:26 -08:00
Cyrus Leung	2c32558a3c	[Bugfix] Fix `--trust-remote-code` conflict (#34218 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 00:29:10 -08:00
Zetong Li	5f970120f0	[Bugfix] Fix memory inconsistency in cross-process shared memory (#32022 ) Signed-off-by: Zetong Li <slippersss@126.com>	2026-02-10 08:22:03 +00:00
Cyrus Leung	998e2d91f8	Revert #34208 (#34216 )	2026-02-09 23:59:04 -08:00
Wentao Ye	e1060a71a1	[Perf] Optimize detokenizer python logic (#32975 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-02-09 23:54:41 -08:00
Chen Zhang	97fa8f6590	[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-10 07:41:16 +00:00
wang.yuqi	dab1de9f38	[Frontend][CI] Consolidate instrumentator entrypoints (#34123 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-10 07:30:19 +00:00
Balaxxe	8d48d0a9d9	[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190 ) Signed-off-by: Balaxxe <136368465+jaim12005@users.noreply.github.com>	2026-02-09 23:06:30 -08:00
Andrew Xia	9608844f96	[responsesAPI] fix simpleContext streaming output_messages (#34188 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-02-09 22:53:07 -08:00
Cyrus Leung	f69b903b4c	[Bugfix] Add `--trust-remote-code` to dataset bench args (#34208 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 22:37:50 -08:00
Lucas Wilkinson	81e217fe6b	[Bugfix] Fix DP Attention Padding in Dummy Run (#34187 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-10 05:29:39 +00:00
Cyrus Leung	ab97bcf662	[CI/Build] Relax `test_mcp_tool_call` (#34204 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 05:18:57 +00:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
Roger Wang	8a5e0e2b2b	[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:03:32 +08:00

1 2 3 4 5 ...

13809 Commits