biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
Roger Wang	1dae7b7843	[Bugfix] Exclude `language_model_only` key from MM AOT compile hash but include in model one (#34508 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-13 13:59:00 +00:00
Ilya Boytsov	071d863e20	Extend ColBERT support to non-standard BERT backbones (#34170 ) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>	2026-02-13 09:53:09 +00:00
Wentao Ye	3d2a026fd0	[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-02-13 16:38:16 +08:00
Aaron Hao	dddbff4624	[Core] Move pause and resume functions into engine (#34125 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-13 00:15:10 -08:00
Marek Michalowski	742d214d6e	[Bugfix] fix the import path in moe test utils.py (#34245 ) Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>	2026-02-13 00:13:45 -08:00
haosdent	4137c5dfa7	[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-13 00:13:22 -08:00
myselvess	bcf0731aa0	[New Model] support new model ovis2.6 (#34426 ) Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>	2026-02-13 00:12:45 -08:00
Cyrus Leung	2f308214c0	[Refactor] Pass full VllmConfig to Renderer (#34485 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 22:48:38 -08:00
Cyrus Leung	1b4e8e53f8	[CI/Build] Fix CUDA re-initialization error in distributed model tests (#34491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-13 06:43:53 +00:00
Cyrus Leung	372b2e762a	[Bugfix] Standardize getting number of image patches/tokens (#34358 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:47:01 -08:00
Andreas Karatzas	6afa587d31	[ROCm][CI] Fix serving tokens test failures (#34047 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 11:27:53 +08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
Cyrus Leung	fc22cae4ac	[CI/Build] Update video URLs for testing (#34446 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:15:36 -08:00
Yanan Cao	96161fe978	[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel (#33373 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-12 18:13:12 -08:00
Alec S	be7370daf3	[Frontend] Enable generic structured_outputs for responses API (#33709 ) Signed-off-by: Alec Solder <alecs@fb.com> Co-authored-by: Alec Solder <alecs@fb.com>	2026-02-12 16:15:48 -08:00
amitz-nv	f120bd42d3	[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 (#33506 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2026-02-12 13:06:58 -08:00
Patrick von Platen	6c0baee610	[Voxtral Realtime] Refactor & Improve buffering logic (#34428 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-12 09:46:43 -08:00
Patrick von Platen	1100a97621	[Voxstral Realtime] Enable tests (#33803 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-12 09:43:24 -08:00
Isotr0py	becbe24808	[Bugfix] Remove broken raw url GGUF model loading support (#34433 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-12 09:40:01 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Cyrus Leung	fb455ed547	[V0 Deprecation] Remove code related to per-request logits processors (#34400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:44:28 +08:00
Cyrus Leung	b96f7314b4	[Refactor] Pass Renderer to Input Processor (#34329 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 19:38:11 -08:00
Michael Goin	ff1f83b056	[Refactor] Replace `activation: str` with `MoEActivation` enum (#33843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 17:29:32 -08:00
Wei Zhao	5aff2699bd	Fix CI failure - Flashinfer Kernel tests (#34316 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-11 14:17:16 -08:00
Raushan Turganbay	527ca32197	[Bugfix] Fix more multimodal tests for transformers V5 (#34334 ) Signed-off-by: raushan <raushan@huggingface.co>	2026-02-11 22:02:05 +01:00
Junseo Park	5458eb835d	[Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963 ) Signed-off-by: pjs102793 <pjs102793@naver.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 21:01:53 +00:00
TJian	5001211369	[ROCm] [CI] fix test_unrecognized_env (#34350 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-02-11 18:50:44 +00:00
Rohan Potdar	fd618871b4	[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-11 11:12:05 -05:00
Harry Mellor	67a42b5a44	Don't try and run GLM-ASR with remote code (#34352 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 08:09:40 -08:00
Lucas Wilkinson	c7914d30f9	Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-11 07:07:56 -08:00
Adam Binford	1b8756562e	Responses harmony system message structured (#34268 ) Signed-off-by: Adam Binford <adamq43@gmail.com>	2026-02-11 05:14:28 -08:00
Linda	275e0d2a99	[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-11 12:38:11 +00:00
Kunshang Ji	cb9574eb85	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-11 00:27:15 -08:00
AllenDou	21dfb842d7	[model] support FunASR model (#33247 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-11 07:37:09 +00:00
Hashem Hashemi	1b3540e6c6	Threshold fix wvSplitk for occasional CI fails (#34013 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-11 03:59:14 +00:00
Cyrus Leung	c9a1923bb4	[Plugin] Simplify IO Processor Plugin interface (#34236 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:47:39 -08:00
Cyrus Leung	b5dcb372e4	[Misc] Clean up validation logic in input processor (#34144 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:29:29 -08:00
Richard Zou	e30cedd44b	[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-10 19:15:40 -08:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
Ilya Markov	67132945bb	[Perf] Move eplb rebalance algo to async thread (#30888 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-02-10 22:19:10 +00:00
Gregory Shtrasberg	f0ca0671c7	[Feature] Warn about unrecognized environment variables (#33581 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-10 15:45:38 -06:00
Pavani Majety	578977bb5e	[SM100] Resubmit FMHA FP8 prefill for MLA (#31195 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-10 16:18:43 -05:00
junuxyz	c5a66d1697	[Core][BugFix] Fix PP KV cache sharding memory validation (#33698 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-10 10:46:24 -05:00
Roberto L. Castro	afdce12c89	[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-10 10:29:52 -05:00
xuebwang-amd	b129136c7a	[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-10 10:08:05 -05:00
Fan Yang	a1946570d8	add --insecure arg to the vllm bench to skip TLS (#34026 ) Signed-off-by: Fan Yang <yan9fan@meta.com> Co-authored-by: Fan Yang <yan9fan@meta.com>	2026-02-10 22:23:52 +08:00
Krish Gupta	748625cdaf	[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-10 13:05:32 +00:00
Harry Mellor	61413973e8	Stop testing for slow tokenizers as they will not exist soon (#34235 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 12:08:20 +00:00
Chen Zhang	97fa8f6590	[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-10 07:41:16 +00:00

1 2 3 4 5 ...

4508 Commits