Zhewen Li
|
0b8e871e5e
|
[CI/Build] Fix test_defaults_with_usage_context in AMD CI (#27926)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:24 -08:00 |
|
Zhewen Li
|
5ee93a5956
|
[CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:10 -08:00 |
|
Snehlata
|
e15601789b
|
[Feature]: Add corrupted request metric to V1 metrics system. (#27306)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-11-05 13:45:29 -08:00 |
|
Richard Zou
|
65ac8d8dc4
|
[Docs] Add guide to debugging vLLM-torch.compile integration (#28094)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-11-05 21:31:46 +00:00 |
|
Isotr0py
|
ffb08379d8
|
[Chore] Remove Nemotron-Nano-VL config copy (#28126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 20:06:45 +00:00 |
|
R3hankhan
|
e04492449e
|
[Hardware][IBM Z] Optimize s390x Dockerfile (#28023)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2025-11-05 11:25:44 -08:00 |
|
Michael Yao
|
518ec6b722
|
[Docs] Clean up README_TUNING.md (#28088)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-11-05 19:01:34 +00:00 |
|
wang.yuqi
|
802748bddb
|
[Bugfix] Fix Qwen3-Reranker-8B load (#28117)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-11-05 18:33:50 +00:00 |
|
Paul Zhang
|
faedbb4d4f
|
[Feature] Extend batch invariant torch.compile to B200 (#27856)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
|
2025-11-05 10:04:49 -08:00 |
|
Samuel Shen
|
40db194446
|
[CI]: Add LMCacheConnector Unit Tests (#27852)
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
|
2025-11-05 09:45:57 -08:00 |
|
Chen Zhang
|
c765f0b443
|
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-05 09:25:32 -08:00 |
|
gmagogsfm
|
002b07c4b2
|
[Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-05 12:22:44 -05:00 |
|
Walter Beller-Morales
|
752ddeacaa
|
[Core] add support for reasoning parser plugins (#28075)
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>
|
2025-11-06 01:15:06 +08:00 |
|
Jiangyun Zhu
|
c18f88c6ca
|
[Kernel] Fuse computation of g and beta for Gated Delta Net (#28095)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-05 09:14:55 -08:00 |
|
Jiaju Zhang
|
6fd0df8132
|
[misc] add vLLM Beijing Meetup (#28127)
Signed-off-by: Jiaju Zhang <jjzhang@redhat.com>
|
2025-11-05 17:12:59 +00:00 |
|
Isotr0py
|
3f5a4b6473
|
[Bugfix] Validate custom logits processor xargs for online serving (#27560)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 16:53:33 +00:00 |
|
Pleaplusone
|
6cae1e5332
|
[ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224)
Signed-off-by: ganyi <ygan@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
|
2025-11-05 10:43:02 -05:00 |
|
Alexei-V-Ivanov-AMD
|
80c9275348
|
Enabling cooperative multi-gpu tests on multi-gpu nodes (#27986)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-11-05 10:35:49 -05:00 |
|
Ilya Markov
|
e50c454672
|
[BugFix] Support EP/DP + EPLB with MTP (#25311)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-11-05 15:22:17 +00:00 |
|
Chen Zhang
|
5d16d0fa62
|
[DCP] check return_lse for all layers in dcp (#27929)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-05 22:27:25 +08:00 |
|
bigmoyan
|
0606bea2b6
|
add kimi reasoning parser (#28128)
Signed-off-by: wangzhengtao <wangzhengtao@msh.team>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-11-05 21:48:33 +08:00 |
|
Frost Mitchell
|
6e97eccf5d
|
[XPU] Enable custom routing functions in IPEX for Llama4 (#28004)
Signed-off-by: frost-intel <frost.mitchell@intel.com>
|
2025-11-05 13:39:57 +00:00 |
|
Boyuan Feng
|
6ab183813c
|
[Graph Partition][Cache] Use inductor partition ops config (#27702)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-05 13:04:48 +00:00 |
|
amirkl94
|
6b7a81185d
|
Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-05 06:06:06 -05:00 |
|
Eric Yue
|
b57789b62b
|
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (#27635)
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
|
2025-11-05 19:03:51 +08:00 |
|
Chauncey
|
377061d481
|
[Misc] fix import error for DeepSeekR1ReasoningParser (#28114)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 19:02:32 +08:00 |
|
Kuntai Du
|
86dca07d9b
|
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-05 10:36:31 +00:00 |
|
Qiu
|
16b37f3119
|
[bugfix] fix wrong dcp_local_seq_lens calc (#27518)
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
|
2025-11-05 17:58:13 +08:00 |
|
Chauncey
|
0976711f3b
|
[Refactor] to simplify and extract the shared logic between chat completion and responses (#27961)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 15:46:39 +08:00 |
|
Chauncey
|
e261d37c9a
|
[Refactor] Lazy-loaded reasoning_parser (#28092)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 15:37:02 +08:00 |
|
Alex Brooks
|
b7cbc25416
|
[Model, Core] Support Granite Speech & LoRA for STT (#24455)
|
2025-11-05 08:33:48 +01:00 |
|
Lucas Wilkinson
|
d43ad5a757
|
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (#28100)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-05 14:54:43 +08:00 |
|
Isotr0py
|
0ff05e3770
|
[Bugfix] Fix encoder-only model support for transformers backend (#28021)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-04 22:24:41 -08:00 |
|
wangxiyuan
|
428bc7bf1c
|
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-04 20:51:16 -08:00 |
|
Zhewen Li
|
878fd5a16f
|
[CI/Build] Enable some fixed tests in AMD CI (#28078)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 03:15:59 +00:00 |
|
Kunshang Ji
|
18b39828d9
|
[XPU] Add gpt-oss model support for Intel GPU (#27786)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-05 02:17:23 +00:00 |
|
tou
|
4ea62b77f5
|
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740)
|
2025-11-05 09:25:09 +08:00 |
|
Vadim Gimpelson
|
d4e547bb7e
|
Revert "[PERF] Decouple projections from GDN custom op" (#28080)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 15:58:23 -08:00 |
|
Aleksandr Malyshev
|
2d977a7a9e
|
[ROCm] gemm_a16w16 upstreaming (#26969)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-11-04 16:01:00 -05:00 |
|
Chenheli Hua
|
1fb4217a05
|
[Multimodal] Make MediaConnector extensible. (#27759)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-11-04 18:28:01 +00:00 |
|
nadavkluger
|
611c86ea3c
|
Added disable rule to track files under benchmarks/lib (#28048)
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai>
|
2025-11-04 18:18:43 +00:00 |
|
Pleaplusone
|
dc937175d4
|
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation (#25763)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-04 18:05:33 +00:00 |
|
Harry Mellor
|
2f1cc8cef1
|
Remove deprecated --rope-scaling and --rope-theta (#28006)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-04 18:01:56 +00:00 |
|
Nick Hill
|
938a81692e
|
[AsyncScheduling] Don't schedule past request max_tokens (#27922)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 17:06:28 +00:00 |
|
Nick Hill
|
c9f66da8fd
|
[PerfFix] Avoid separate thread for MP executor shm spin (#28012)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 08:33:55 -08:00 |
|
yt0428
|
05cae69f0f
|
[model] Add support for openPangu_Ultra_MoE (#27521)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 08:17:20 -08:00 |
|
Vadim Gimpelson
|
5fd8f02ea9
|
[PERF] Decouple projections from GDN custom op (#27512)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 08:11:41 -08:00 |
|
lyrisz
|
97e3dda84b
|
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (#27284)
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-04 07:49:25 -08:00 |
|
Nick Hill
|
5a0a6dfd55
|
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size (#28025)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 07:38:16 -08:00 |
|
bnellnm
|
938772af03
|
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123)
|
2025-11-04 21:59:45 +08:00 |
|