Chendi.Xue
|
3b1dbaad4e
|
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-30 16:47:30 +00:00 |
|
Johnny
|
b4a2f3ac36
|
[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
2026-03-30 09:36:18 -07:00 |
|
roikoren755
|
8e6293e838
|
[Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-30 12:33:49 -04:00 |
|
Hongxia Yang
|
dbdd9ae067
|
[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-30 15:49:23 +00:00 |
|
Matthias Gehre
|
e8b055a5ac
|
[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-30 07:30:52 -07:00 |
|
tomeras91
|
246dc7d864
|
[Misc] Add @tomeras91 as a maintainer of Nemotron related code + mamba block (#38547)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-30 21:12:17 +08:00 |
|
Thomas Parnell
|
7c3f88b2a8
|
[Bugfix] Remove false-positive format mismatch warnings in FLA ops (#38255)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2026-03-30 12:32:26 +00:00 |
|
Li, Jiang
|
6557f4937f
|
[Bugfix][CPU] Skip set_num_threads after thread binding (#38535)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-30 20:13:00 +08:00 |
|
Andreas Karatzas
|
677424c7ac
|
[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 04:58:53 -07:00 |
|
Collin McCarthy
|
1031c84c36
|
Fix ambiguous num_blocks for hybrid attn mamba (#37236)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-30 11:09:45 +00:00 |
|
aliialsaeedii
|
7e76af14fa
|
[Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253)
Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>
|
2026-03-30 10:26:46 +00:00 |
|
yzong-rh
|
3683fe6c06
|
[Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls (#38158)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 10:12:13 +00:00 |
|
Nicolò Lucchesi
|
cc06b4e86b
|
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-30 09:41:50 +00:00 |
|
TJian
|
03ac6ca895
|
[ROCm] [DOC] Update the Documentation to include ROCm Nightly Wheel support (#38457)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-30 02:25:46 -07:00 |
|
haosdent
|
a08b7733fd
|
[CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-30 07:48:33 +00:00 |
|
Tan Pin Siang
|
85c0950b1f
|
[ROCm] Enable MORI EP for unquantized MoE with AITER backend (#37529)
Signed-off-by: Tan Pin Siang <pinsiang.tan@amd.com>
|
2026-03-30 15:19:33 +08:00 |
|
Juan Pérez de Algaba
|
57861ae48d
|
(security) Fix SSRF in batch runner download_bytes_from_url (#38482)
Signed-off-by: jperezde <jperezde@redhat.com>
|
2026-03-30 07:10:01 +00:00 |
|
Jee Jee Li
|
ac30a8311e
|
[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-29 23:59:42 -07:00 |
|
PikaPikachu
|
63babd17f1
|
[Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965)
Signed-off-by: kangletian <Letian.Kang@amd.com>
|
2026-03-30 14:24:06 +08:00 |
|
Kevin H. Luu
|
fec5aeca12
|
[ci] Soft fail and disable retry for AMD build image job (#38505)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-29 23:05:26 -07:00 |
|
Jaewon
|
d816834c1a
|
[MoE] Add RoutingMethodType.Simulated to TRT-LLM FP8/NVFP4 kernel allowlists (#38329)
Signed-off-by: Jaewon Lee <jaewon@meta.com>
|
2026-03-29 22:53:43 -07:00 |
|
Roger Wang
|
92f0db57a8
|
[Misc] Always use forward_mulmat for Conv3d on newer versions of torch. (#38487)
|
2026-03-30 05:39:41 +00:00 |
|
Andreas Karatzas
|
bea23536f6
|
[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 05:36:45 +00:00 |
|
Jiangyun Zhu
|
c133f33746
|
Add @ZJY0516 to CODEOWNERS (#38497)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-29 21:10:00 -07:00 |
|
Stanislav Kirillov
|
a6db99ba02
|
[Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 04:07:28 +00:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
allgather
|
8c0b6267d7
|
[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410)
Signed-off-by: allgather <all2allops@gmail.com>
|
2026-03-29 09:59:06 +00:00 |
|
Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
Shubhra Pandit
|
5b8c30d62b
|
[Spec Decode, BugFix] Propagate norm_before_fc from Eagle3 speculator (#38111)
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
|
2026-03-29 00:42:06 +00:00 |
|
haosdent
|
d39b8daf5f
|
[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-29 00:27:52 +00:00 |
|
Walter Beller-Morales
|
fafca38adc
|
[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-28 18:30:54 +00:00 |
|
Kunshang Ji
|
aa4eb0db78
|
[CI]revert initialize_model context manager (#38426)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-28 16:56:50 +00:00 |
|
Andreas Karatzas
|
af89140efc
|
[ROCm][CI] Fix UV install in Dockerfile.rocm to detect curl failures and retry (#38415)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-29 00:47:42 +08:00 |
|
haosdent
|
b2bc736b12
|
[CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-28 22:43:24 +08:00 |
|
whyiug
|
58c959a767
|
[Misc]: clean up non-core lint issues (#37049)
Signed-off-by: whyiug <whyiug@hotmail.com>
|
2026-03-28 10:28:16 -04:00 |
|
Bvicii
|
bda3eda82d
|
[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
2026-03-28 06:32:52 -07:00 |
|
Michael Goin
|
2bf5b70ae8
|
[CI Bugfix] Pre-download missing FlashInfer headers in Docker build (#38391)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-28 06:09:00 -07:00 |
|
yzong-rh
|
6dad4c5722
|
[Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
|
2026-03-28 09:06:56 +00:00 |
|
Liwen
|
171775f306
|
Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108)
Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-28 08:27:11 +00:00 |
|
TJian
|
58a249bc61
|
[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-28 06:07:03 +00:00 |
|
IriKa
|
148a5c1226
|
[Bugfix]fix output Nan/Inf in marlin if dtype=float16 (#33972)
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>
|
2026-03-27 16:36:08 -07:00 |
|
Wei Zhao
|
b69bf2f0b1
|
[Perf] Use torch compile to fuse pack topk in trtllm moe (#37695)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
|
2026-03-27 17:30:46 -06:00 |
|
rongfu.leng
|
88149b635e
|
Add nvidia h800 moe config (#31201)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2026-03-27 16:28:48 -07:00 |
|
Hongxia Yang
|
83a4df049d
|
[ROCm][Documentation] update quickstart and installation to include rocm nightly docker tips (#38367)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-27 23:20:19 +00:00 |
|
Gregory Shtrasberg
|
731285c939
|
[ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 (#38252)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-03-27 18:03:12 -05:00 |
|
Johnny
|
97d19197bc
|
[NVIDIA] Fix DGX Spark logic (#38126)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com>
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Sathish Sanjeevi <SKPsanjeevi@users.noreply.github.com>
Co-authored-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-27 15:26:07 -07:00 |
|
Giancarlo Delfin
|
384e4d5f48
|
[Model Runner V2] Rebuild attention metadata before eagle decode full… (#38311)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-27 13:46:42 -07:00 |
|
Nicolò Lucchesi
|
44a6528028
|
[CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-27 13:25:19 -07:00 |
|