biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	0916e7960b	[GDN] Use CPU tensors to build GDN metadata (#34498 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-13 01:24:45 -08:00
Wentao Ye	3d2a026fd0	[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-02-13 16:38:16 +08:00
Aaron Hao	dddbff4624	[Core] Move pause and resume functions into engine (#34125 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-13 00:15:10 -08:00
Martin Hickey	47e9b63e1a	[KVConnector] Clean up redundant code in KV connectors (#34147 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-02-13 00:14:30 -08:00
Matthias Gehre	934acddef9	[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-13 00:14:27 -08:00
Marek Michalowski	742d214d6e	[Bugfix] fix the import path in moe test utils.py (#34245 ) Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>	2026-02-13 00:13:45 -08:00
haosdent	4137c5dfa7	[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-13 00:13:22 -08:00
Harry Huang	7a8a46ddcb	[BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec (#34440 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-13 00:13:14 -08:00
myselvess	bcf0731aa0	[New Model] support new model ovis2.6 (#34426 ) Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>	2026-02-13 00:12:45 -08:00
Cyrus Leung	ec090c2429	[Refactor] Call renderer for online IO processor request (#34490 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-12 22:48:45 -08:00
Roger Wang	eea3024f43	[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-12 22:48:42 -08:00
Cyrus Leung	2f308214c0	[Refactor] Pass full VllmConfig to Renderer (#34485 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 22:48:38 -08:00
Cyrus Leung	1b4e8e53f8	[CI/Build] Fix CUDA re-initialization error in distributed model tests (#34491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-13 06:43:53 +00:00
haosdent	dcf6ee8592	[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-12 21:04:06 -08:00
Cyrus Leung	372b2e762a	[Bugfix] Standardize getting number of image patches/tokens (#34358 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:47:01 -08:00
Andreas Karatzas	6afa587d31	[ROCm][CI] Fix serving tokens test failures (#34047 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 11:27:53 +08:00
Cyrus Leung	94ed6cf6ea	Add new sections to CODEOWNERS (#34309 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:39:28 -08:00
Harry Huang	bf37812ca7	[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode (#33706 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-12 18:21:52 -08:00
Frank Wang	b86bf4417e	[Bugfix] Fix Random Dataset Prefix Length Inaccuracy (#33907 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-12 18:21:19 -08:00
Yanan Cao	de13dd781f	[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure (#34025 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-12 18:21:05 -08:00
LoganJane	62788f99a4	[Bugfix] Delete unused redundant code in Kimi-K2.5 (#34427 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-12 18:18:42 -08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
bnellnm	04ea31baab	[Bugfix] Remove assert that's no longer valid (#34443 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-12 18:18:15 -08:00
Harry Huang	6f019e6e0a	[BugFix] Add block_size validation for mamba cache align mode (#34445 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-12 18:18:07 -08:00
Zhuohan Li	d707678dfb	Fix num_logprobs parameter description in sampler.py (#34451 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2026-02-12 18:18:03 -08:00
Cyrus Leung	fc22cae4ac	[CI/Build] Update video URLs for testing (#34446 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:15:36 -08:00
Yanan Cao	96161fe978	[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel (#33373 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-12 18:13:12 -08:00
Jaewon	4453ba8d9e	[Core] Profiler improvements and lazy initialization (#33198 ) Signed-off-by: Jaewon Lee <jaewon@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-12 16:16:38 -08:00
Jaewon	aa181c923b	[Core] Add sleep level 0 mode with enqueue/wait pattern (#33195 ) Signed-off-by: Jaewon Lee <jaewon@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-12 16:16:25 -08:00
Alec S	be7370daf3	[Frontend] Enable generic structured_outputs for responses API (#33709 ) Signed-off-by: Alec Solder <alecs@fb.com> Co-authored-by: Alec Solder <alecs@fb.com>	2026-02-12 16:15:48 -08:00
Mengtao (Martin) Yuan	9ea1f598ce	Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (#34378 ) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com>	2026-02-12 16:14:43 -08:00
amitz-nv	f120bd42d3	[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 (#33506 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2026-02-12 13:06:58 -08:00
Hashem Hashemi	fac4e96940	small adjustment to wvSplitKrc (#34410 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-12 20:26:36 +00:00
Michael Goin	6d4e27ce29	[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA (#34374 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-12 12:08:06 -08:00
Andreas Karatzas	4c078fa546	[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-12 18:47:34 +00:00
Patrick von Platen	6c0baee610	[Voxtral Realtime] Refactor & Improve buffering logic (#34428 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-12 09:46:43 -08:00
Patrick von Platen	1100a97621	[Voxstral Realtime] Enable tests (#33803 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-12 09:43:24 -08:00
xuebwang-amd	766e167821	[ROCm][quantization] improve OCP weight quant parser robust (#34431 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-12 09:40:19 -08:00
Isotr0py	becbe24808	[Bugfix] Remove broken raw url GGUF model loading support (#34433 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-12 09:40:01 -08:00
Harry Mellor	679ca5d8d3	Fix MoE for the Transformers modelling backend (#34436 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-12 09:29:42 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Nicolò Lucchesi	334c715e0f	[Docs] Spec decoding docs warning removal (#34439 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-12 09:01:51 -08:00
Aaron Hao	7b5a8b4a9d	[BUG] Reset running requests when clearing cache for pause/resume (#34382 ) Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-02-12 16:19:13 +00:00
danisereb	dea63512bb	Add config file for fused MoE for Nemotron (TP4, B200) (#34411 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-12 06:09:55 -08:00
Douglas Lehr	8a798be929	[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter (#34192 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>	2026-02-12 05:06:33 -08:00
Cyrus Leung	fb455ed547	[V0 Deprecation] Remove code related to per-request logits processors (#34400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:44:28 +08:00
baonudesifeizhai	f5897613fb	Fix Mistral config remap to accept compressed-tensors quantization #34028 (#34104 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2026-02-12 08:22:06 +00:00
Louie Tsai	55a1a9563a	Vllm CPU benchmark suite improvement (#34128 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-02-12 16:04:44 +08:00
AllenDou	386bfe5d08	[bugfix] refactor FunASR's _get_data_parser (#34397 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-12 07:26:49 +00:00
Kyle Sayers	e9cd691132	[Bugfix] Fix Sparse24 Compressed Tensors models (#33446 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 23:15:16 -08:00

... 2 3 4 5 6 ...

14061 Commits