Chuan (Richard) Li
|
c188749bcd
|
[ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) (#35850)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-06 20:24:03 +00:00 |
|
Alexei-V-Ivanov-AMD
|
225d1090a0
|
Enabling some B200-specific tests on MI355 (#35253)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
|
2026-03-06 19:27:20 +00:00 |
|
eellison
|
f3c6c9c9d7
|
[CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
|
2026-03-06 10:53:37 -08:00 |
|
Nick Hill
|
26bd43b52d
|
Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262)
|
2026-03-06 08:28:09 -08:00 |
|
Travis Johnson
|
6b625a8807
|
[Bugfix] Quickfix followups to busy loop removal in #28053 (#36068)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-06 08:13:05 -08:00 |
|
Richard Zou
|
54756b6109
|
[compile] Stop unconditionally patching constrain_to_fx_strides (#36152)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-06 10:17:27 -05:00 |
|
Raphaël Rialland
|
39f9ea0da4
|
[Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) (#36165)
|
2026-03-06 09:15:31 -05:00 |
|
Isotr0py
|
e4ae148a78
|
[Refactor] Modular video loader backend refactoring (#35202)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-06 06:06:59 -08:00 |
|
Isotr0py
|
1d0c0d209c
|
[Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-06 06:06:45 -08:00 |
|
Chenguang Zheng
|
fcb73f306c
|
[bugfix] add api process rank in default multimodal request (#36150)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
|
2026-03-06 12:00:09 +00:00 |
|
Harry Mellor
|
e2090bf3af
|
[CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-06 11:50:28 +00:00 |
|
Andreas Karatzas
|
2a00d3241f
|
[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 01:17:08 -08:00 |
|
Alex Brooks
|
10f4db4dbe
|
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-06 01:16:56 -08:00 |
|
Nicolò Lucchesi
|
5b3ba94ab4
|
[Core][KVConnector] Support HMA+NixlConnector (#35758)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-06 08:51:21 +01:00 |
|
zhanqiuhu
|
90f3c01fa4
|
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-06 08:50:44 +01:00 |
|
Andreas Karatzas
|
807d680337
|
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 15:15:12 +08:00 |
|
Tyler Michael Smith
|
5afb387bd4
|
Change "following fields were present in the request but ignored" log from warn to debug (#36173)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-03-05 22:15:46 -08:00 |
|
Walter Beller-Morales
|
43e77e59ab
|
[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-05 22:15:29 -08:00 |
|
Russell Bryant
|
00bd08edee
|
[Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-05 22:15:19 -08:00 |
|
Ajay Anubolu
|
43f10573c9
|
[Bugfix] Fix misleading context length error messages (#36197)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-05 22:15:12 -08:00 |
|
Yongye Zhu
|
86e1060b17
|
[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-03-05 22:04:44 -08:00 |
|
Mark McLoughlin
|
27066d1b2b
|
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-03-05 22:04:31 -08:00 |
|
cong-or
|
57c84ff129
|
perf: add __slots__ to KVCacheBlock (#36164)
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
|
2026-03-05 22:04:09 -08:00 |
|
Xiang Shi
|
e68de8adc0
|
docs: fix wrong cc in int8.md (#36209)
Signed-off-by: Xiang Shi <realkevin@tutanota.com>
|
2026-03-06 06:01:02 +00:00 |
|
Andreas Karatzas
|
a1ffa56a1e
|
[CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 05:07:29 +00:00 |
|
Shiyan Deng
|
0a208d1f54
|
[BugFix] Fix engine hanging after KV cache initialization failure (#35478)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-05 20:58:09 -08:00 |
|
Shiyan Deng
|
03a49bb8f0
|
[Feature] Add --distributed-timeout-seconds CLI option (#36047)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-05 20:57:51 -08:00 |
|
Shiyan Deng
|
8e87cc57f1
|
[Bug] Fix a corner case in _process_simple_streaming_events (#34754)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-05 20:57:32 -08:00 |
|
Cyrus Leung
|
6dd302653f
|
[Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs (#36158)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-06 12:32:48 +08:00 |
|
Cyrus Leung
|
de00ebeac4
|
[Bugfix] Fix simple Mistral-Small example (#36156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 20:25:11 -08:00 |
|
Andreas Karatzas
|
639680d220
|
[ROCm][CI] Adding missing dependencies for Multi-modal models tests (#36177)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 12:23:10 +08:00 |
|
Rohan Potdar
|
c5362c739f
|
Reenable features for ROCm attention backends (#36185)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-05 20:21:06 -08:00 |
|
Nikhil Gupta
|
0a49676fb0
|
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2026-03-06 03:48:59 +00:00 |
|
Jeffrey Wang
|
c012a8c477
|
Don't fire ray compatibility webhook when PR or branch is not provided (#36088)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-03-06 00:42:21 +00:00 |
|
Dor Huri
|
ebed80a7c8
|
[Performance] Extract KV-cache update from TreeAttention backend (#35384)
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il>
|
2026-03-06 00:22:43 +00:00 |
|
Nick Hill
|
a73af584fe
|
[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-05 14:48:10 -08:00 |
|
Zhengxu Chen
|
a97954b6a8
|
[compile] Consistent compiler config for saved/loaded vllm backends. (#35810)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 15:08:12 -05:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Russell Bryant
|
5395471d29
|
[CI] Add explicit permissions to macOS smoke test workflow (#35775)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-05 19:08:48 +00:00 |
|
Frank Wang
|
a57c877f18
|
[BugFix] Fallback from FA4->FA2 for Batch Invariance (#36059)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2026-03-05 14:05:56 -05:00 |
|
Xin Yang
|
f917020983
|
[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-05 13:47:53 -05:00 |
|
tomeras91
|
86483ca774
|
[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-05 09:49:05 -08:00 |
|
Netanel Haber
|
b93a9e6f6d
|
ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-05 17:29:30 +00:00 |
|
Xinyu Chen
|
d8839ef7d9
|
[XPU] Enable ModelRunnerV2 on XPU (#36078)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-03-05 17:19:18 +00:00 |
|
Avery Miao
|
e998fa76b9
|
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994)
Signed-off-by: Miao, Avery <avery.miao@intel.com>
|
2026-03-05 09:16:29 -08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
AllenDou
|
3ee68590c7
|
refactor funasr model. (#36108)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:07:37 -08:00 |
|
Cyrus Leung
|
7196348157
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 08:07:19 -08:00 |
|
Ning Xie
|
176c799f4c
|
[openai api] log exception in exception handler (1/N) (#31164)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-05 16:00:12 +00:00 |
|