Isotr0py
|
b0906d8b02
|
[MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU (#36472)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-09 03:43:44 -07:00 |
|
Cyrus Leung
|
f96c3ab08c
|
[Deprecation][1/2] Remove items deprecated in v0.18 (#36470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-09 03:43:23 -07:00 |
|
Xin Yang
|
dc6b578466
|
[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-08 23:41:01 -07:00 |
|
liuzhenwei
|
1bc9c77f6d
|
[XPU] Add test script of PD disaggregation (#36434)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-03-09 05:50:27 +00:00 |
|
Alex Brooks
|
65a4da1504
|
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-03-09 05:46:23 +00:00 |
|
wang.yuqi
|
fff3711a24
|
[Frontend][2/n] Improve pooling entrypoints | embed. (#36110)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2026-03-09 11:42:19 +08:00 |
|
wang.yuqi
|
dcf8862fd4
|
[Examples][1/n] Resettle basic examples. (#35579)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:22:53 -07:00 |
|
Jiangyun Zhu
|
e5ff140216
|
[cudagraph] fix cudagraph warning in deepseekv32 (#28044)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-08 20:27:41 -04:00 |
|
danisereb
|
0a6a3a1290
|
Add support for ModelOpt MXFP8 MoE models (#35986)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-03-08 13:00:05 -07:00 |
|
Andreas Karatzas
|
40077ea3de
|
[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-08 14:42:24 +08:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
PatchyTIS
|
a6be75dbd2
|
[Core] NGram GPU Implementation compatible with Async Scheduler (#29184)
|
2026-03-07 13:51:37 -08:00 |
|
Micah Williamson
|
ee54f9cdb9
|
[ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224)
|
2026-03-07 13:50:52 -08:00 |
|
Micah Williamson
|
fc4657756f
|
[ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174)
|
2026-03-07 13:50:17 -08:00 |
|
qli88
|
eebd14651f
|
[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416)
|
2026-03-07 13:49:56 -08:00 |
|
rahul-sarvam
|
85f50eb41f
|
Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
|
2026-03-08 01:16:24 +08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
milesial
|
755356b3d1
|
feat: expose media_io_kwargs at runtime (#34778)
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
|
2026-03-07 04:27:04 +00:00 |
|
Andreas Karatzas
|
58928475e4
|
[ROCm][CI] Making entrypoints more deterministic on ROCm (#36293)
|
2026-03-06 19:04:40 -08:00 |
|
Alexei-V-Ivanov-AMD
|
225d1090a0
|
Enabling some B200-specific tests on MI355 (#35253)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
|
2026-03-06 19:27:20 +00:00 |
|
eellison
|
f3c6c9c9d7
|
[CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
|
2026-03-06 10:53:37 -08:00 |
|
Isotr0py
|
e4ae148a78
|
[Refactor] Modular video loader backend refactoring (#35202)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-06 06:06:59 -08:00 |
|
Isotr0py
|
1d0c0d209c
|
[Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-06 06:06:45 -08:00 |
|
Chenguang Zheng
|
fcb73f306c
|
[bugfix] add api process rank in default multimodal request (#36150)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
|
2026-03-06 12:00:09 +00:00 |
|
Harry Mellor
|
e2090bf3af
|
[CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-06 11:50:28 +00:00 |
|
Alex Brooks
|
10f4db4dbe
|
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-06 01:16:56 -08:00 |
|
Nicolò Lucchesi
|
5b3ba94ab4
|
[Core][KVConnector] Support HMA+NixlConnector (#35758)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-06 08:51:21 +01:00 |
|
zhanqiuhu
|
90f3c01fa4
|
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-06 08:50:44 +01:00 |
|
Andreas Karatzas
|
807d680337
|
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 15:15:12 +08:00 |
|
Walter Beller-Morales
|
43e77e59ab
|
[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-05 22:15:29 -08:00 |
|
Ajay Anubolu
|
43f10573c9
|
[Bugfix] Fix misleading context length error messages (#36197)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-05 22:15:12 -08:00 |
|
Yongye Zhu
|
86e1060b17
|
[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-03-05 22:04:44 -08:00 |
|
Mark McLoughlin
|
27066d1b2b
|
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-03-05 22:04:31 -08:00 |
|
cong-or
|
57c84ff129
|
perf: add __slots__ to KVCacheBlock (#36164)
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
|
2026-03-05 22:04:09 -08:00 |
|
Andreas Karatzas
|
a1ffa56a1e
|
[CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 05:07:29 +00:00 |
|
Shiyan Deng
|
8e87cc57f1
|
[Bug] Fix a corner case in _process_simple_streaming_events (#34754)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-05 20:57:32 -08:00 |
|
Cyrus Leung
|
6dd302653f
|
[Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs (#36158)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-06 12:32:48 +08:00 |
|
Zhengxu Chen
|
a97954b6a8
|
[compile] Consistent compiler config for saved/loaded vllm backends. (#35810)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 15:08:12 -05:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
Cyrus Leung
|
7196348157
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 08:07:19 -08:00 |
|
Ning Xie
|
176c799f4c
|
[openai api] log exception in exception handler (1/N) (#31164)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-05 16:00:12 +00:00 |
|
Or Ozeri
|
612e7729c2
|
[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-05 14:25:15 +00:00 |
|
Andreas Karatzas
|
b03ff6a96b
|
[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-05 21:52:49 +08:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Isotr0py
|
21eb2c3372
|
[Chore] Correct MTP models test registry ordering (#36115)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:55:04 +00:00 |
|
Benjamin Chislett
|
57c629e9c1
|
[Bugfix] Fix block_size for hybrid model MTP (#36036)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-05 06:10:54 +00:00 |
|
Zhengxu Chen
|
dd6dbd93f8
|
[compile] Fix extra cache save on warm start. (#35921)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 12:56:30 +08:00 |
|
daje0601
|
3b23d57c96
|
[Model] Add LoRA support for Whisper models (#29856)
Signed-off-by: daje0601 <englishmt4118@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-05 10:38:25 +08:00 |
|