Michael Goin
|
a8ffc4f0f2
|
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 (#25508)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 12:49:55 -07:00 |
|
jiahanc
|
d5944d5146
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-09-23 15:44:35 -04:00 |
|
Michael Goin
|
24fab45d96
|
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 15:29:26 -04:00 |
|
ElizaWszola
|
63400259d0
|
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-09-23 12:03:10 -07:00 |
|
Amir Samani
|
8c1c81a3de
|
[core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 14:33:06 -04:00 |
|
Hashem Hashemi
|
a3a7828010
|
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
|
2025-09-23 14:31:45 -04:00 |
|
Jee Jee Li
|
5abb117901
|
[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank (#25487)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-23 18:19:25 +00:00 |
|
Ekagra Ranjan
|
867ecdd1c8
|
[Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length (#24531)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-23 10:46:40 -07:00 |
|
Weida Hong
|
24e8222745
|
[Misc] Reduce initialization time of auto_tune (#23682)
Signed-off-by: Weida Hong <wdhongtw@google.com>
|
2025-09-23 17:34:58 +00:00 |
|
Burkhard Ringlein
|
100b630a60
|
[V1][Kernel] Add triton implementation for reshape_and_cache_flash (#24503)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-23 12:52:40 -04:00 |
|
Ming Yang
|
527821d191
|
Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-23 09:45:39 -07:00 |
|
Wentao Ye
|
846197f505
|
[Log] Optimize kv cache memory log from Bytes to GiB (#25204)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-23 12:44:37 -04:00 |
|
rivos-shreeasish
|
2357480b1a
|
[BugFix] Fix UB in per_token_group_quant.cu (#24913)
Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com>
|
2025-09-23 09:14:22 -07:00 |
|
bnellnm
|
f11e3c516b
|
[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 16:11:34 +00:00 |
|
Harry Mellor
|
875d6def90
|
Add backward compatibility for GuidedDecodingParams (#25422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-23 17:07:30 +01:00 |
|
Lucas Wilkinson
|
cc1dc7ed6d
|
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-23 16:02:10 +00:00 |
|
Thomas Parnell
|
a903669e10
|
[V1] Remove V0 code paths for Hybrid models (#25400)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-23 08:26:13 -07:00 |
|
Michael Goin
|
2c58742dff
|
[UX] Change kv-cache-memory log level to debug (#25479)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 08:01:24 -07:00 |
|
Fanli Lin
|
4c966e440e
|
[XPU] Fix MOE DP accuracy issue on XPU (#25465)
|
2025-09-23 14:32:57 +00:00 |
|
Peter Pan
|
da5e7e4329
|
[Docs] NixlConnector quickstart guide (#24249)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-09-23 14:23:22 +00:00 |
|
Chauncey
|
f05a4f0e34
|
[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-23 16:08:02 +02:00 |
|
Joel
|
61d1b35561
|
[BugFix] Register expert_map as named buffer for wake_up and sleep (#25458)
Signed-off-by: wuxibin <wuxibin@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-23 21:49:13 +08:00 |
|
Isotr0py
|
b6a136b58c
|
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 13:05:46 +00:00 |
|
vllmellm
|
0d9fe260dd
|
[docs] Benchmark Serving Incorrect Arg (#25474)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-09-23 06:05:11 -07:00 |
|
Jee Jee Li
|
273690a50a
|
[Core] Optimize LoRA weight loading (#25403)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-23 18:19:45 +08:00 |
|
Isotr0py
|
231c2c63e4
|
[Bugfix] Fix idefics3 tie_word_embeddings (#25454)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 10:06:48 +00:00 |
|
Andreas Hartel
|
4322c553a6
|
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>
|
2025-09-23 17:56:31 +08:00 |
|
Cyrus Leung
|
babad6e5dd
|
[Misc] Move DP for ViT code inside model executor dir (#25459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-23 09:20:52 +00:00 |
|
Zhikaiiii
|
9383cd6f10
|
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
2025-09-23 16:07:27 +08:00 |
|
Ming Yang
|
ba8d2165b6
|
Handle triton kernel import exception (#25319)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-09-23 00:56:00 -07:00 |
|
Cyrus Leung
|
c98be0a232
|
[Model] Enable DP for ViT in Qwen2-VL (#25445)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-23 05:17:10 +00:00 |
|
Chendi.Xue
|
5774b0a1da
|
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
|
2025-09-23 04:17:42 +00:00 |
|
Varun Sundar Rabindranath
|
e8db44f883
|
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-09-22 21:01:09 -07:00 |
|
Michael Yao
|
fafbe11af4
|
[Docs] Fix griffe warnings in vllm/lora/ops (#25369)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-23 03:42:58 +00:00 |
|
Michael Goin
|
78237e43bf
|
[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-22 20:26:32 -07:00 |
|
Lucia Fang
|
eea1783989
|
[benchmarks]allow skip ready check for bench serve (#25420)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-09-23 03:21:48 +00:00 |
|
Kunshang Ji
|
f225ea7dd9
|
[XPU] Fix compile_size is None case. (#25433)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-23 03:09:00 +00:00 |
|
JJJYmmm
|
fc97733da8
|
[feat] Support MRoPE + YaRN (#25384)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
|
2025-09-23 03:04:47 +00:00 |
|
Wentao Ye
|
4741239db7
|
[Bug] Fix Long Context OOM Issue (#25290)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-22 22:04:15 -04:00 |
|
Isotr0py
|
c625f9043c
|
[V0 deprecation] Remove _set_default_args_v0 function (#25409)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 01:52:09 +00:00 |
|
Isotr0py
|
6fa78d8f23
|
[V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 01:48:12 +00:00 |
|
Wentao Ye
|
9949aa2ef1
|
[Perf] Apply torch.compile for per_block_cast_to_fp8 (#24611)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-22 19:42:45 -06:00 |
|
Alexander Matveev
|
0b7bed9c38
|
[Performance] Remove input pads in cutlass_mla and optimize v_proj output handling (#25184)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-09-22 19:20:53 -06:00 |
|
Matthew Bonanni
|
ac0048c0ae
|
[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Chris Bamford <chrisbam4d@gmail.com>
|
2025-09-22 17:26:17 -07:00 |
|
Nicolò Lucchesi
|
090197034f
|
[Bugfix] Fix missing clear_connector_metadata (#25397)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-23 08:10:59 +08:00 |
|
Russell Bryant
|
f31ff87460
|
[Core] Drop overly aggressive whisper assertion (#25408)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-22 17:09:52 -07:00 |
|
Luka Govedič
|
d588cd2406
|
[Bugfix] fix custom op test (#25429)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-09-23 00:07:43 +00:00 |
|
Alec S
|
45d7d852d3
|
[Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-22 23:38:19 +00:00 |
|
Johnny Yang
|
8bed179109
|
[TPU] update torch_xla dependency for PyPI compatibility (#25278)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-09-22 16:14:44 -07:00 |
|
Cyrus Leung
|
f552d5e578
|
[CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-22 13:18:24 -07:00 |
|