Kata Coder
|
5719a4e4e6
|
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
|
2026-02-20 20:01:40 -08:00 |
|
pougetat
|
11be2c74dc
|
[Realtime] Add Qwen3-ASR realtime streaming support (#34613)
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-20 19:59:42 -08:00 |
|
Xin Yang
|
7a5adad480
|
[Kernel] Optimize sample_recovered_tokens_kernel (#34974)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 19:59:06 -08:00 |
|
Li
|
59c6233297
|
Support prompt_embeds for pooling requests in output processor (#34904)
Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
|
2026-02-20 19:57:38 -08:00 |
|
Taneem Ibrahim
|
d38cd3dde5
|
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-02-20 19:56:33 -08:00 |
|
Rohan Potdar
|
ded333fb9b
|
[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-20 19:56:16 -08:00 |
|
Yanan Cao
|
9d7577b2bd
|
[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-20 19:55:51 -08:00 |
|
Vlad Tiberiu Mihailescu
|
e739c29ea4
|
[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2026-02-20 19:54:55 -08:00 |
|
yugong333
|
a55caf6ae9
|
[LoRA] Support Quantized Adapters (#30286)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 19:54:35 -08:00 |
|
Lucas Wilkinson
|
0e22cd618b
|
Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " (#34997)
|
2026-02-20 17:19:19 -08:00 |
|
Wei Zhao
|
ea5f903f80
|
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 13:37:31 -08:00 |
|
Ryan Rock
|
0632ed8778
|
[AMD][CI] Fix test_custom_allreduce for A100 testgroup (#34735)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-02-20 21:33:04 +00:00 |
|
Lucas Wilkinson
|
aaefc58ee0
|
[CI] Revert PRs 34818 and 33600 (#34979)
|
2026-02-20 13:25:50 -08:00 |
|
Wei Zhao
|
f24b2de3d3
|
[Test] Add FP8 KV Cache Testing for MLA Backends (#34473)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-20 18:51:58 +00:00 |
|
Michael Goin
|
fac1507f03
|
[CI] Remove failing prime-rl integration test (#34843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-20 10:17:42 -08:00 |
|
Zhengxu Chen
|
f863994084
|
[compile] Fix torch.compile time discrepancy in logging. (#34912)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 08:47:14 -08:00 |
|
Zhengxu Chen
|
e4a5d8c653
|
[compile] Move torch_aot_compile directory under torch_compile_cache (#34831)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-20 08:46:45 -08:00 |
|
Yanan Cao
|
a6d0299c75
|
[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching (#34185)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-20 08:36:51 -08:00 |
|
Harry Mellor
|
6ce80f7071
|
Ensure that MkDocs v2 does not get installed (#34958)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-20 15:38:11 +00:00 |
|
Huamin Li
|
1fe462168c
|
[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor (#34870)
Signed-off-by: Huamin Li <3ericli@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 06:21:56 -08:00 |
|
Flora Feng
|
ed31a020ee
|
[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py (#34909)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 06:20:46 -08:00 |
|
Cyrus Leung
|
f9ac19204f
|
[V0 Deprecation] Remove unused MM placeholders in request output (#34944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-20 06:19:23 -08:00 |
|
Vadim Gimpelson
|
59965affbd
|
[BUGFIX] Fix _dummy_run missing prepare_inputs_event synchronization (#34866)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-20 05:54:27 -08:00 |
|
Xin Yang
|
b1c4f0b265
|
[Kernel] Optimize grouped topk kernel (#34206)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 01:34:45 -08:00 |
|
Kevin McKay
|
8de7c636cc
|
[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support (#32877)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-19 22:25:46 -08:00 |
|
Frank Wang
|
059779231f
|
[Minor] Add logging when using MXFP4 MXFP8 TRTLLM backend (#34916)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-19 22:07:57 -08:00 |
|
tianshu-Michael-yu
|
ea37530b47
|
[Models] LFM2: Support LoRA (#34921)
Co-authored-by: Piotr Mazurek <piotr635@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-19 22:07:23 -08:00 |
|
Micah Williamson
|
f5432e35a3
|
[ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout (#34922)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-20 05:37:49 +00:00 |
|
杨朱 · Kiki
|
07cab212f0
|
[Misc] Add deprecated environment variable utilities (#33677)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-19 21:33:25 -08:00 |
|
rasmith
|
0c1dc42748
|
[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-19 21:32:40 -08:00 |
|
Varun Chawla
|
676f82ae81
|
Add validation to reject non-text content in system messages (#34072)
Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
|
2026-02-19 21:30:33 -08:00 |
|
Elizabeth Thomas
|
81bfc21a6a
|
[Model Bash]: Improve FP8 Oracle for Config Specific Kernel Selection (#34260)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2026-02-19 21:29:08 -08:00 |
|
Matthias Gehre
|
4e2c7caf2d
|
[Bugfix] Add regression test for MoE quant_config under torch.compile (#34335)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-20 13:27:26 +08:00 |
|
Bowen Bao
|
d9e62c03eb
|
[Quark] Fix MoE fp8 activation scale handling on mi300 (#34386)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2026-02-19 21:27:14 -08:00 |
|
Kevin H. Luu
|
a1a2d79442
|
[ci] Use the right tag for CPU arm64 image (#34915)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-02-19 19:59:15 -08:00 |
|
Cyrus Leung
|
ac900c89bb
|
[Refactor] Implement output type check in LLM (#34794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-19 19:57:55 -08:00 |
|
Mark McLoughlin
|
76df6072ff
|
[Core] Fix state names in pause_scheduler() (#34840)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-02-19 17:21:46 -08:00 |
|
Michael Goin
|
16f24e8797
|
[CI] Add GPT-OSS Eval job for H100 (#34359)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-19 17:14:54 -08:00 |
|
Nick Hill
|
40b2f1c3d9
|
[Model Runner V2] Minor CPU optimizations (#34856)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-19 16:05:37 -08:00 |
|
Mayank Ketkar
|
648951a9c3
|
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665)
Signed-off-by: Mayank Ketkar <mketkar@zoox.com>
Signed-off-by: Mayank Ketkar <mayket04@gmail.com>
Co-authored-by: Mayank Ketkar <mketkar@zoox.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-19 19:01:00 -05:00 |
|
Michael Goin
|
f72061a19a
|
[UX] More descriptive reasons in is_supported_config for MoE (#34908)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-19 15:20:52 -08:00 |
|
Matthew Bonanni
|
662205d34e
|
[Bugfix] Fix Basic Models Test (#34818)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-19 14:49:07 -08:00 |
|
Roger Wang
|
4fb8beefaa
|
[Bugfix] Fix cutlass fp8 kernel on hopper for Qwen3.5 (#34914)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-19 13:34:55 -08:00 |
|
Alexei-V-Ivanov-AMD
|
304319c4ed
|
Change targets for AMD build in the "CI" pipeline (#34918)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2026-02-19 21:26:53 +00:00 |
|
Wentao Ye
|
c683d11c94
|
[Refactor] Deprecate head_first for chunk_gated_delta_rule (#34263)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-19 13:23:49 -05:00 |
|
roikoren755
|
3eff45d793
|
Revert "[NemotronH] Do not force router to run in fp32 (#34582)" (#34808)
Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-19 09:47:05 -08:00 |
|
Robert Shaw
|
4685a630a2
|
[Model Bash][DeepSeekR1] Remove Shared Expert Clone (#34344)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-19 07:56:14 -08:00 |
|
Eldar Kurtić
|
ee1d25f199
|
[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers (#34471)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-19 07:55:41 -08:00 |
|
Linda
|
6fff24f30f
|
[Bugfix] Qwen3.5 kv-scale weight remapping (#34719)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2026-02-19 04:13:37 -08:00 |
|
Cyrus Leung
|
23210a911e
|
[CI/Build] Try to make beam search test less flaky (#34885)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-19 19:16:58 +08:00 |
|