haosdent
|
d6e04f4c43
|
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-04 11:56:22 +01:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Joe Runde
|
6f0dd93801
|
[Core] Remove busy loop from idle buffer readers (#28053)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 07:44:20 +00:00 |
|
Cyrus Leung
|
e379396167
|
[Refactor] Clean up processor kwargs extraction (#35872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-03 19:53:53 -08:00 |
|
AllenDou
|
c1d963403c
|
[model] support FireRedASR2 (#35727)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:41:30 -08:00 |
|
William Zhang
|
70c73df69e
|
[Bugfix] Fix EVS implementation for Qwen3 VL (#33607)
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
|
2026-03-04 02:18:11 +00:00 |
|
Micah Williamson
|
e7213003cb
|
[ROCm][CI] Fix TP size issue for test_gpt_oss (#35887)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-03 20:57:34 +00:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Robert Shaw
|
881a6b011b
|
[CI] Temporarily Disable Llama4 MoE Refactor Test (#35870)
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-03 10:36:15 -08:00 |
|
Matthew Bonanni
|
8e1fd5baf0
|
[CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests (#35882)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-03 09:26:44 -08:00 |
|
JasonCohere
|
ae88468bcc
|
fix: Ensure invalid audio files return 400 error (#34715)
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-03 08:47:39 -08:00 |
|
ojhaanshika
|
e05cb3b93e
|
TRTLLM gen-full attn Test Coverage (#34986)
Signed-off-by: Anshika Ojha <anshikao@nvidia.com>
Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>
|
2026-03-03 11:35:34 -05:00 |
|
Lucas Wilkinson
|
28ef9ba399
|
[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-03 07:21:57 -08:00 |
|
TJian
|
fb7fdc49c4
|
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-03 06:24:21 -08:00 |
|
wang.yuqi
|
fd4a90f337
|
[CI] And PPL test for Qwen3.5. (#35853)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-03 13:15:51 +00:00 |
|
hallerite
|
b8401cde0e
|
add regression test (#35834)
Signed-off-by: hallerite <git@hallerite.com>
|
2026-03-03 07:32:15 +00:00 |
|
Micah Williamson
|
8b9e8b7454
|
[ROCm][CI] Fix Assertion Logic For test_gpt_oss (#35806)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-03 05:08:04 +00:00 |
|
Isotr0py
|
7d8bbe6f42
|
[CI/Build] Automatically patch video metadata for multimodal processor test (#35822)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 04:27:45 +00:00 |
|
aykoppol
|
25e02647c2
|
[Core] Add optional flags to check for repetitive token patterns in engine output (#35451)
Signed-off-by: aykoppol <aykoppol+git@gmail.com>
|
2026-03-03 12:23:25 +08:00 |
|
Robert Shaw
|
6521ccf286
|
[CI] Temporarily Disable Nightly Failures (#35770)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-03 01:49:13 +00:00 |
|
Jakub Zakrzewski
|
c8b678e53e
|
[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-03-03 08:32:14 +08:00 |
|
Roger Wang
|
1b82b433fc
|
[Bugfix] Fix MM processor test for Qwen3.5 (#35797)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-03-02 23:05:08 +00:00 |
|
Fynn Schmitt-Ulms
|
9433acb8df
|
[Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-03-02 14:29:09 -05:00 |
|
Richard Zou
|
d1a6e96d9e
|
[torch.compile] Improve cold and warm start compile tests (#35709)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-02 19:27:06 +00:00 |
|
Isotr0py
|
cc0d565f40
|
[CI/Build] Enable Qwen3.5 tests on CI (#35763)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-02 17:43:53 +00:00 |
|
Turner Jabbour
|
4034c3d32e
|
[Core] Move test utility to test file (#35672)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
|
2026-03-02 10:56:03 -05:00 |
|
Runkai Tao
|
ada4f4fadd
|
[Fix Bug]num_active_loras always equals to zero (#34119)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-02 23:17:46 +08:00 |
|
Andreas Karatzas
|
ec27b36b4b
|
[CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 08:10:54 +00:00 |
|
EdalatiAli
|
cb21972a97
|
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-01 23:31:19 -08:00 |
|
Andreas Karatzas
|
c34963f138
|
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 15:04:18 +08:00 |
|
Hongxia Yang
|
f26650d649
|
[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-02 06:02:43 +00:00 |
|
Jesse Cai
|
a60985b07e
|
Fix deprecated v1 config tests (#35327)
Signed-off-by: Jesse Cai <jessecai@fb.com>
|
2026-03-01 20:32:03 -05:00 |
|
haosdent
|
6290470843
|
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-01 15:14:46 -05:00 |
|
Asaf Gardin
|
bbf81f9a92
|
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-01 20:40:23 +08:00 |
|
Ryan Rock
|
87d319c52f
|
[AMD][CI] Support Triton attention with ExampleConnector (#34931)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-03-01 09:58:07 +02:00 |
|
gnovack
|
3ecd0bf9fc
|
Add TMA support to fused_moe_lora kernel (#32195)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-01 10:55:25 +08:00 |
|
Martin Vit
|
95a395dbec
|
[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557)
Signed-off-by: Martin Vit <martin@voipmonitor.org>
|
2026-02-28 20:57:08 +00:00 |
|
Wentao Ye
|
e113a30113
|
[Deprecation] Deprecate code in 0.17 as scheduled (#35441)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-28 17:32:37 +00:00 |
|
Augusto Yao
|
8e75d88554
|
add io_process_plugin for sparse embedding (#34214)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-28 09:16:37 +00:00 |
|
Hashem Hashemi
|
7600642eae
|
Add padding support to wvSplitK solution for skinny GEMMs (#33762)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-28 09:02:05 +00:00 |
|
Andreas Karatzas
|
1e69c04887
|
[ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-28 08:59:26 +00:00 |
|
Chauncey
|
06254d4cbb
|
[CI] add trainer_send_weights for MockWeightTransferEngine (#35589)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-28 06:47:43 +00:00 |
|
Andreas Karatzas
|
f5d1281c9d
|
[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-28 13:57:31 +08:00 |
|
Umut Polat
|
1d5ab5d603
|
[Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
|
2026-02-27 21:26:19 -08:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|
Aaron Hao
|
2ce6f3cf67
|
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-27 13:45:21 -07:00 |
|
Lucas Wilkinson
|
1d532f9d8f
|
[DP] Only use DP padding when cudagraphs are actually used (#34102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-27 15:14:31 -05:00 |
|
Zhengxu Chen
|
29b35477b0
|
[compile] Fix caching error over pytree slice node. (#35308)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-27 19:34:16 +00:00 |
|
Huamin Li
|
157722da75
|
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2026-02-28 01:50:37 +08:00 |
|
fort726
|
905d76b51d
|
[Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-27 09:26:02 -08:00 |
|