Ye (Charlotte) Qi
|
fa6a6be519
|
[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2026-03-02 21:11:56 +00:00 |
|
Aaron Hao
|
cad21918e3
|
[BUG] Fix rlhf_async example (#35788)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-03-02 20:36:40 +00:00 |
|
Jeffrey Wang
|
53700bf49b
|
[ci] Add Ray compatibility check informational CI job (#34672)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-03-02 12:06:16 -08:00 |
|
Yashwant Bezawada
|
a13d8c03c9
|
[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057)
Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>
|
2026-03-02 15:04:47 -05:00 |
|
Fynn Schmitt-Ulms
|
9433acb8df
|
[Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-03-02 14:29:09 -05:00 |
|
Richard Zou
|
d1a6e96d9e
|
[torch.compile] Improve cold and warm start compile tests (#35709)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-02 19:27:06 +00:00 |
|
CSWYF3634076
|
2a9e3347e9
|
[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2026-03-02 18:56:33 +00:00 |
|
Isotr0py
|
cc0d565f40
|
[CI/Build] Enable Qwen3.5 tests on CI (#35763)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-02 17:43:53 +00:00 |
|
Patryk Wolsza
|
358e4d5ba7
|
[CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307)
Signed-off-by: PatrykWo <patryk.wolsza@intel.com>
|
2026-03-02 17:02:26 +00:00 |
|
Cyrus Leung
|
792a74b973
|
[Doc] Improve UX of --enable-log-requests (#35723)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-02 08:24:09 -08:00 |
|
Turner Jabbour
|
4034c3d32e
|
[Core] Move test utility to test file (#35672)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
|
2026-03-02 10:56:03 -05:00 |
|
Martin Hickey
|
7560d674c9
|
[CI] Fix mypy for vllm/device allocator (#35518)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-02 15:53:18 +00:00 |
|
ElizaWszola
|
d9c7730877
|
[Performance] Extract kv update ops from MLA attention backends (#34627)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Di Wu <dw2761@nyu.edu>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-02 10:43:19 -05:00 |
|
Runkai Tao
|
ada4f4fadd
|
[Fix Bug]num_active_loras always equals to zero (#34119)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-02 23:17:46 +08:00 |
|
Harry Mellor
|
7e9149d9a9
|
[Docs] Add breadcrumbs for better UX (#35749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-02 14:31:54 +00:00 |
|
Martin Hickey
|
87c98b0236
|
[MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-02 13:23:42 +00:00 |
|
Tyler Michael Smith
|
de7dd634b9
|
Fix unresolved-import errors when using Astral's ty by removing src.root (#35681)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-03-02 10:26:47 +00:00 |
|
Chauncey
|
9a87b0578f
|
[Feat] Supports Anthropic Messages count_tokens API (#35588)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-02 09:48:54 +00:00 |
|
wangxiyuan
|
510bc9e1df
|
[Misc] Cleanup useless current_platform import (#35715)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-03-02 09:36:54 +00:00 |
|
Charles Ashby
|
cbd361fd46
|
[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169)
Signed-off-by: Charles Ashby <charlesa.l@hotmail.com>
|
2026-03-02 09:34:35 +00:00 |
|
Nicolò Lucchesi
|
c212202d93
|
[Misc] Bound NIXL upper bound version (#35495)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-02 16:57:07 +08:00 |
|
Andreas Karatzas
|
ec27b36b4b
|
[CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 08:10:54 +00:00 |
|
Charlie Fu
|
3fd1d4ec2c
|
[Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-03-02 07:43:38 +00:00 |
|
EdalatiAli
|
cb21972a97
|
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-01 23:31:19 -08:00 |
|
Andreas Karatzas
|
c34963f138
|
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 15:04:18 +08:00 |
|
Hongxia Yang
|
f26650d649
|
[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-02 06:02:43 +00:00 |
|
Kunshang Ji
|
92f5d0f070
|
[XPU] fix mxfp4 activation type (#35691)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-02 11:48:39 +08:00 |
|
Jesse Cai
|
a60985b07e
|
Fix deprecated v1 config tests (#35327)
Signed-off-by: Jesse Cai <jessecai@fb.com>
|
2026-03-01 20:32:03 -05:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
zhanqiuhu
|
57a96e26c9
|
Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" (#34832)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
|
2026-03-01 22:32:37 +00:00 |
|
Richard Zou
|
e82fbeec7b
|
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-01 21:44:22 +00:00 |
|
haosdent
|
6290470843
|
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-01 15:14:46 -05:00 |
|
Woosuk Kwon
|
72f4d16262
|
[Model Runner V2] Use block table apis for capture inputs (#35671)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-01 10:31:13 -08:00 |
|
Seungho Yoon
|
5a435507d8
|
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382)
Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com>
|
2026-03-01 09:59:30 -05:00 |
|
Taneem Ibrahim
|
59d7af9c6c
|
[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-03-01 09:26:44 -05:00 |
|
Asaf Gardin
|
bbf81f9a92
|
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-01 20:40:23 +08:00 |
|
Woosuk Kwon
|
da543d1abe
|
[Model Runner V2] Minor refactoring for EncoderRunner (#35628)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-01 00:15:39 -08:00 |
|
Ryan Rock
|
87d319c52f
|
[AMD][CI] Support Triton attention with ExampleConnector (#34931)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-03-01 09:58:07 +02:00 |
|
lin-shh
|
a9ec392c86
|
Fix typo: implictly -> implicitly in isaac.py docstring (#35646)
|
2026-02-28 23:34:37 -08:00 |
|
lailoo
|
afd089f231
|
[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs (#35617)
|
2026-03-01 03:27:37 +00:00 |
|
gnovack
|
3ecd0bf9fc
|
Add TMA support to fused_moe_lora kernel (#32195)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-01 10:55:25 +08:00 |
|
Woosuk Kwon
|
e3eb146f7a
|
[Model Runner V2] Add ModelStateInterface [4/N] (#35621)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-28 13:19:45 -08:00 |
|
Martin Vit
|
95a395dbec
|
[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557)
Signed-off-by: Martin Vit <martin@voipmonitor.org>
|
2026-02-28 20:57:08 +00:00 |
|
Isotr0py
|
e94b263bd6
|
[Chore] Cleanup BNB utilization dead code (#35620)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-28 19:22:41 +00:00 |
|
Wentao Ye
|
e113a30113
|
[Deprecation] Deprecate code in 0.17 as scheduled (#35441)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-28 17:32:37 +00:00 |
|
Cyrus Leung
|
1dafb29f91
|
[Benchmark] Avoid unnecessary video download in MMVU (#35618)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-28 09:07:02 -08:00 |
|
emricksini-h
|
49b9ae32e9
|
[Fix] Avoid sending image input to other PP ranks (#35405)
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-01 00:14:29 +08:00 |
|
cwazai
|
63d7972f13
|
Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj (#35581)
|
2026-02-28 14:50:55 +00:00 |
|
flutist
|
c68e69f144
|
custom dataset img support base64 (#35280)
Signed-off-by: xjx <493337577@qq.com>
|
2026-02-28 11:49:52 +00:00 |
|
Chauncey
|
7e08c22b8c
|
[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#35271)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-28 10:12:00 +00:00 |
|