biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Wilkinson	c7914d30f9	Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-11 07:07:56 -08:00
Adam Binford	1b8756562e	Responses harmony system message structured (#34268 ) Signed-off-by: Adam Binford <adamq43@gmail.com>	2026-02-11 05:14:28 -08:00
Linda	275e0d2a99	[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-11 12:38:11 +00:00
Harry Mellor	0f5e55e7a8	Make JAIS compatible with Transformers v5 (#34264 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 12:30:37 +00:00
Harry Mellor	1e9204bff3	Make Qwen3VL compatible with Transformers v5 (#34262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-11 04:13:23 -08:00
Li, Jiang	05339a7b20	[Bugfix][CPU] Fix llama4 inference on CPU (#34321 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-02-11 19:07:23 +08:00
Harry Mellor	40b8f55358	[Docs] Reduce time spent generating API docs (#34255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 02:56:02 -08:00
Seiji Eicher	5045d5c983	Patch protobuf for CVE-2026-0994 (#34253 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-02-11 02:25:04 -08:00
Nick Hill	e09546cf05	[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 11:03:24 +01:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00
Nick Hill	79504027ef	[Misc] Bump `fastsafetensors` version for latest fixes (#34273 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 00:30:09 -08:00
Luka Govedič	addac0e653	[torch.compile] Enable AR+rms fusion by default available for `-O2` (#34299 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-11 00:30:00 -08:00
Cyrus Leung	675a22ed66	[Chore] Move `BaseRenderer` to `base.py` (#34308 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 00:29:51 -08:00
Kunshang Ji	cb9574eb85	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-11 00:27:15 -08:00
AllenDou	21dfb842d7	[model] support FunASR model (#33247 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-11 07:37:09 +00:00
R3hankhan	d1b837f0ae	[CPU] Enable FP16 (Half dtype) support for s390x (#34116 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-11 14:41:42 +08:00
Roger Wang	0b20469c62	[Bugfix] Fix weight naming in Qwen3.5 (#34313 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 21:37:14 -08:00
Tyler Michael Smith	d7982daff5	[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 05:15:52 +00:00
Robert Shaw	9b17c57460	[ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast (#34298 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-11 05:00:00 +00:00
Hashem Hashemi	1b3540e6c6	Threshold fix wvSplitk for occasional CI fails (#34013 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-11 03:59:14 +00:00
Matthias Gehre	7a048ee65f	[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-02-11 03:58:56 +00:00
Cyrus Leung	c9a1923bb4	[Plugin] Simplify IO Processor Plugin interface (#34236 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:47:39 -08:00
zofia	b482f71e9f	[XPU][7/N] enable xpu fp8 moe (#34202 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2026-02-11 03:33:59 +00:00
Дзержи́нский	1485396abb	[Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022 ) Signed-off-by: Dzerzhinsky <256908701+AstroVoyager7@users.noreply.github.com> Signed-off-by: Дзержи́нский <256908701+AstroVoyager7@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-10 19:31:51 -08:00
Kebe	5ee5c86eeb	[Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2026-02-10 19:31:36 -08:00
Cyrus Leung	b5dcb372e4	[Misc] Clean up validation logic in input processor (#34144 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:29:29 -08:00
Tyler Michael Smith	066c6da6a0	[WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-10 19:15:43 -08:00
Richard Zou	e30cedd44b	[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 19:15:40 -08:00
Cyrus Leung	3bcd494ef4	[Redo] Add `--trust-remote-code` to dataset bench args (#34251 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 11:10:12 +08:00
tianshu-Michael-yu	0e725a7d22	[Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-02-11 11:07:51 +08:00
Lucas Wilkinson	ba0511fd80	[Misc] Add run one batch script that supports profiling (#32968 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-10 18:29:49 -08:00
Micah Williamson	4a1550d22d	[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-11 01:08:11 +00:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
7. Sun	dc6de33c3d	[CI] Add pip caching to cleanup_pr_body workflow (#32979 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-02-11 00:45:28 +00:00
Tyler Michael Smith	c4b9e6778f	[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 15:13:20 -08:00
Richard Zou	341eed3d30	[torch.compile] Disable recursive pre_grad_passes (#34092 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 18:02:31 -05:00
Zhengkai Zhang	6f2f59f2b3	[Misc][Spec Decode] support different load config for draft model (#34022 ) Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com> Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>	2026-02-10 14:52:43 -08:00
Ilya Markov	bb2fc8b5e7	[BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-02-10 22:34:47 +00:00
Ilya Markov	67132945bb	[Perf] Move eplb rebalance algo to async thread (#30888 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-02-10 22:19:10 +00:00
Gregory Shtrasberg	f0ca0671c7	[Feature] Warn about unrecognized environment variables (#33581 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-10 15:45:38 -06:00
Pavani Majety	578977bb5e	[SM100] Resubmit FMHA FP8 prefill for MLA (#31195 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-10 16:18:43 -05:00
Roger Wang	9615575afc	[Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:12:31 -08:00
Matthew Bonanni	4293c00b84	[Benchmarks] Fix attention benchmark smoke test (#34269 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-10 16:04:07 -05:00
J Seppänen	506ad7d7c1	[Bugfix] Fix weights offloading for sleep mode (#32947 ) Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-02-10 20:38:17 +00:00
Reagan Lee	fdd6f2ad58	Convert online APIs to use Renderer (#34084 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>	2026-02-10 19:44:31 +00:00
Qi Wang	33bcd3dc3b	[Misc] Introduce ec_both role EC (encoder cache) connector (#34182 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-02-10 18:55:35 +00:00
Michael Goin	1f5febb4b8	[UX nit] Fix non-default api_server_count message (#34152 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-10 10:35:58 -08:00
Andy Lo	ae871ca923	Minor cleanup for Voxtral (#34247 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-02-10 18:18:30 +00:00
Woosuk Kwon	a2443de5fa	[Model Runner V2] Use pinned memory for write_contents (#34222 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-10 08:55:22 -08:00
Harry Mellor	f84a2a8f31	[Docs] Speed up build environment set-up (#34240 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 16:34:43 +00:00

... 10 11 12 13 14 ...

14386 Commits