biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kata Coder	5719a4e4e6	[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com>	2026-02-20 20:01:40 -08:00
pougetat	11be2c74dc	[Realtime] Add Qwen3-ASR realtime streaming support (#34613 ) Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-20 19:59:42 -08:00
Xin Yang	7a5adad480	[Kernel] Optimize sample_recovered_tokens_kernel (#34974 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 19:59:06 -08:00
Li	59c6233297	Support prompt_embeds for pooling requests in output processor (#34904 ) Signed-off-by: Li Zhang <lzhanga@amazon.com> Co-authored-by: Li Zhang <lzhanga@amazon.com>	2026-02-20 19:57:38 -08:00
Taneem Ibrahim	d38cd3dde5	[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-02-20 19:56:33 -08:00
Rohan Potdar	ded333fb9b	[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-20 19:56:16 -08:00
Yanan Cao	9d7577b2bd	[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 19:55:51 -08:00
Vlad Tiberiu Mihailescu	e739c29ea4	[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466 ) Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>	2026-02-20 19:54:55 -08:00
yugong333	a55caf6ae9	[LoRA] Support Quantized Adapters (#30286 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Signed-off-by: wz1qqx <ziqi.wang@novita.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com> Co-authored-by: wz1qqx <ziqi.wang@novita.ai> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 19:54:35 -08:00
Lucas Wilkinson	0e22cd618b	Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " (#34997 )	2026-02-20 17:19:19 -08:00
Wei Zhao	ea5f903f80	Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 13:37:31 -08:00
Ryan Rock	0632ed8778	[AMD][CI] Fix test_custom_allreduce for A100 testgroup (#34735 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-02-20 21:33:04 +00:00
Lucas Wilkinson	aaefc58ee0	[CI] Revert PRs 34818 and 33600 (#34979 )	2026-02-20 13:25:50 -08:00
Wei Zhao	f24b2de3d3	[Test] Add FP8 KV Cache Testing for MLA Backends (#34473 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-20 18:51:58 +00:00
Michael Goin	fac1507f03	[CI] Remove failing prime-rl integration test (#34843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-20 10:17:42 -08:00
Zhengxu Chen	f863994084	[compile] Fix torch.compile time discrepancy in logging. (#34912 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 08:47:14 -08:00
Zhengxu Chen	e4a5d8c653	[compile] Move torch_aot_compile directory under torch_compile_cache (#34831 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-20 08:46:45 -08:00
Yanan Cao	a6d0299c75	[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching (#34185 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-20 08:36:51 -08:00
Harry Mellor	6ce80f7071	Ensure that MkDocs v2 does not get installed (#34958 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-20 15:38:11 +00:00
Huamin Li	1fe462168c	[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor (#34870 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 06:21:56 -08:00
Flora Feng	ed31a020ee	[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py (#34909 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 06:20:46 -08:00
Cyrus Leung	f9ac19204f	[V0 Deprecation] Remove unused MM placeholders in request output (#34944 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-20 06:19:23 -08:00
Vadim Gimpelson	59965affbd	[BUGFIX] Fix `_dummy_run` missing `prepare_inputs_event` synchronization (#34866 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-20 05:54:27 -08:00
Xin Yang	b1c4f0b265	[Kernel] Optimize grouped topk kernel (#34206 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 01:34:45 -08:00
Kevin McKay	8de7c636cc	[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support (#32877 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-19 22:25:46 -08:00
Frank Wang	059779231f	[Minor] Add logging when using MXFP4 MXFP8 TRTLLM backend (#34916 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-19 22:07:57 -08:00
tianshu-Michael-yu	ea37530b47	[Models] LFM2: Support LoRA (#34921 ) Co-authored-by: Piotr Mazurek <piotr635@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-19 22:07:23 -08:00
Micah Williamson	f5432e35a3	[ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout (#34922 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-20 05:37:49 +00:00
杨朱 · Kiki	07cab212f0	[Misc] Add deprecated environment variable utilities (#33677 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-19 21:33:25 -08:00
rasmith	0c1dc42748	[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-19 21:32:40 -08:00
Varun Chawla	676f82ae81	Add validation to reject non-text content in system messages (#34072 ) Signed-off-by: Varun Chawla <varun_6april@hotmail.com>	2026-02-19 21:30:33 -08:00
Elizabeth Thomas	81bfc21a6a	[Model Bash]: Improve FP8 Oracle for Config Specific Kernel Selection (#34260 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>	2026-02-19 21:29:08 -08:00
Matthias Gehre	4e2c7caf2d	[Bugfix] Add regression test for MoE quant_config under torch.compile (#34335 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-02-20 13:27:26 +08:00
Bowen Bao	d9e62c03eb	[Quark] Fix MoE fp8 activation scale handling on mi300 (#34386 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-02-19 21:27:14 -08:00
Kevin H. Luu	a1a2d79442	[ci] Use the right tag for CPU arm64 image (#34915 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-02-19 19:59:15 -08:00
Cyrus Leung	ac900c89bb	[Refactor] Implement output type check in LLM (#34794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-19 19:57:55 -08:00
Mark McLoughlin	76df6072ff	[Core] Fix state names in pause_scheduler() (#34840 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-19 17:21:46 -08:00
Michael Goin	16f24e8797	[CI] Add GPT-OSS Eval job for H100 (#34359 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-19 17:14:54 -08:00
Nick Hill	40b2f1c3d9	[Model Runner V2] Minor CPU optimizations (#34856 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-19 16:05:37 -08:00
Mayank Ketkar	648951a9c3	[Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665 ) Signed-off-by: Mayank Ketkar <mketkar@zoox.com> Signed-off-by: Mayank Ketkar <mayket04@gmail.com> Co-authored-by: Mayank Ketkar <mketkar@zoox.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 19:01:00 -05:00
Michael Goin	f72061a19a	[UX] More descriptive reasons in is_supported_config for MoE (#34908 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-19 15:20:52 -08:00
Matthew Bonanni	662205d34e	[Bugfix] Fix Basic Models Test (#34818 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 14:49:07 -08:00
Roger Wang	4fb8beefaa	[Bugfix] Fix cutlass fp8 kernel on hopper for Qwen3.5 (#34914 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-19 13:34:55 -08:00
Alexei-V-Ivanov-AMD	304319c4ed	Change targets for AMD build in the "CI" pipeline (#34918 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2026-02-19 21:26:53 +00:00
Wentao Ye	c683d11c94	[Refactor] Deprecate `head_first` for `chunk_gated_delta_rule` (#34263 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-19 13:23:49 -05:00
roikoren755	3eff45d793	Revert "[NemotronH] Do not force router to run in fp32 (#34582 )" (#34808 ) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-19 09:47:05 -08:00
Robert Shaw	4685a630a2	[Model Bash][DeepSeekR1] Remove Shared Expert Clone (#34344 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-19 07:56:14 -08:00
Eldar Kurtić	ee1d25f199	[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers (#34471 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 07:55:41 -08:00
Linda	6fff24f30f	[Bugfix] Qwen3.5 kv-scale weight remapping (#34719 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-02-19 04:13:37 -08:00
Cyrus Leung	23210a911e	[CI/Build] Try to make beam search test less flaky (#34885 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-19 19:16:58 +08:00

1 2 3 4 5 ...

14087 Commits