Andreas Karatzas
|
f6220f9877
|
[ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker (#34878)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-19 08:25:26 +00:00 |
|
Andreas Karatzas
|
2df2bb27b0
|
[ROCm][CI] Removing all blocking labels from MI355 until stable infra (#34879)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-19 07:53:08 +00:00 |
|
Tal Nir
|
f75b61a9e9
|
[Voxtral Realtime] Fix engine crash on empty multimodal embeddings (#34862)
Signed-off-by: Tal Nir <tal@nervexneurotech.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-18 23:21:47 -08:00 |
|
Wei Zhao
|
7f51e93864
|
[Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix (#34876)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-18 23:20:30 -08:00 |
|
Alex Brooks
|
4611af1663
|
[Bugfix] Add Quant Config to Llava Next Projector (#34847)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-02-18 23:18:23 -08:00 |
|
Manrique Vargas
|
ad5aa6bd9f
|
fix(docs): fix typos in comments and docstrings (#34836)
Signed-off-by: machov <mv1742@nyu.edu>
|
2026-02-18 23:17:41 -08:00 |
|
Jaeyeon Kim(김재연)
|
9681068cf9
|
[Frontend] Fix reasoning_tokens for text-based parsers in Responses API (#33513)
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
|
2026-02-18 23:16:41 -08:00 |
|
Kevin H. Luu
|
b6101d384d
|
Deprecate test-pipeline.yaml (#34864)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-02-19 02:15:27 +00:00 |
|
Woosuk Kwon
|
5fcb0cdd68
|
[Model Runner V2] Use FP32 for Gumbel Noise (#34854)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-18 17:07:37 -08:00 |
|
Woosuk Kwon
|
c878b43b64
|
[Model Runner V2] Remove unnecessary copies in PW CUDA graph capture (#34849)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-18 15:52:50 -08:00 |
|
rasmith
|
2b84ac669c
|
[CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py (#34181)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-18 23:10:19 +00:00 |
|
zhrrr
|
11d3976b88
|
[Model Runner V2] support piecewise & mixed cudagraph (#32771)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2026-02-18 15:03:17 -08:00 |
|
Yongye Zhu
|
40da9625a1
|
[MoE Refactor] Convert mxfp4 marlin into modular kernel format (#34588)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-18 14:37:14 -08:00 |
|
Flora Feng
|
8d9babd4de
|
Fix empty tool_call_id in Anthropic messages API tool result conversion (#34745)
Signed-off-by: <>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Flora Feng <sfeng33@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-18 14:31:59 -08:00 |
|
Aaron Hao
|
e99ba957ec
|
[BUG] Fixing Weight Sync unit test (#34841)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-18 17:20:10 -05:00 |
|
Kyle Sayers
|
64ac1395e8
|
[Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-02-18 13:48:11 -08:00 |
|
Cyrus Leung
|
61cf087680
|
[Bugfix] Fix lora tests (#34834)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-18 13:22:31 -08:00 |
|
Wenlong Wang
|
847a57cd12
|
[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-18 13:03:24 -08:00 |
|
rasmith
|
fcd6ac97ed
|
[CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm (#34655)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-18 15:00:40 -05:00 |
|
Woosuk Kwon
|
95be2a7f22
|
[Model Runner V2] Minor simplification for DCP (#34786)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-18 11:04:53 -08:00 |
|
Jaden Mathias
|
0e60c925cf
|
[Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported (#34455)
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com>
|
2026-02-18 18:54:54 +00:00 |
|
Teng Ma
|
d7ff22204a
|
[Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826)
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com>
|
2026-02-18 18:26:24 +00:00 |
|
Isotr0py
|
c0bd8b13da
|
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
|
2026-02-18 09:46:53 -08:00 |
|
Michael Goin
|
caeb887bf6
|
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-18 09:39:22 -08:00 |
|
Ilya Markov
|
6b3166a7c7
|
[CI][Bugfix] Fix multinode test script (#34820)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-18 11:45:10 -05:00 |
|
Robert Shaw
|
25e2e136ef
|
[CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 11:32:44 -05:00 |
|
Robert Shaw
|
6874638bc4
|
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 07:42:36 -08:00 |
|
Burkhard Ringlein
|
e24663c5a9
|
Add unit tests for fp8 output fusion of triton_attn (#34228)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-18 06:22:49 -05:00 |
|
Nick Hill
|
c50e105a88
|
[Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-18 00:49:21 -08:00 |
|
Cyrus Leung
|
a766b30349
|
[Renderer] Deprecate code paths for old input processing (#34775)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 00:35:04 -08:00 |
|
Asaf Joseph Gardin
|
1faa8cb73c
|
[Quantization] - Added uses_meta_device_weights to quant config (#34645)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-02-17 23:43:44 -08:00 |
|
Marek Michalowski
|
e89a91d927
|
[Bugfix] fix activation in cpu_fused_moe_torch call (#34696)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
|
2026-02-17 23:39:46 -08:00 |
|
Michael Goin
|
909b147197
|
[Bugfix] Fix prefix creation for Qwen3.5 (#34723)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-17 23:39:15 -08:00 |
|
ElizaWszola
|
a88b3be7c4
|
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-17 23:35:04 -08:00 |
|
Nick Hill
|
a49ea5a58f
|
[Model Runner V2] A bit more PP simplification (#34766)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 21:39:07 -08:00 |
|
Cyrus Leung
|
30ebe0dc3c
|
[CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 12:19:11 +08:00 |
|
Andreas Karatzas
|
cef65f0715
|
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-18 03:59:53 +00:00 |
|
Russell Bryant
|
6f3b2047ab
|
[Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: isotr0py <2037008807@qq.com>
|
2026-02-18 03:53:35 +00:00 |
|
Luka Govedič
|
02e8f26cea
|
[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-18 03:29:15 +00:00 |
|
Hongxia Yang
|
4a00a511bb
|
[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-02-17 19:19:41 -08:00 |
|
Cyrus Leung
|
a0d8d944e2
|
[Renderer] Move MM Hash parsing into Renderer (#34711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 19:18:55 -08:00 |
|
Amr Mahdi
|
df3f537a66
|
[CI] Remove unused precompiled wheel args from image build (#34767)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-02-17 18:58:18 -08:00 |
|
Matthew Bonanni
|
7743152957
|
[Attention] Refactor check_and_update_config (#33600)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-17 17:06:54 -08:00 |
|
Wentao Ye
|
ab33d2a629
|
[Feature] Decode Context Parallel support for GPU model runner v2 (#34179)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-17 16:27:15 -08:00 |
|
Woosuk Kwon
|
be3af2d29e
|
[Model Runner V2] Further simplification for PP (#34724)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-17 15:18:18 -08:00 |
|
Jongseok Park
|
c656ba3b4d
|
[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 23:14:30 +00:00 |
|
Matthew Bonanni
|
dc5fa77a4e
|
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-17 14:01:27 -05:00 |
|
Flora Feng
|
1e4a084c8e
|
[CI] Fix flaky test_parsable_context (#34717)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-17 18:42:52 +00:00 |
|
Richard Zou
|
7967e854da
|
[BugFix] Fix sp tests (#34716)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-17 17:07:56 +00:00 |
|
almayne
|
6bd6d0c3c1
|
Fixed whisper CPU test that does not spawn properly. (#34324)
Signed-off-by: Anna Mayne <anna.mayne@arm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-17 06:46:23 -08:00 |
|