Commit Graph

14386 Commits

Author SHA1 Message Date
Andreas Karatzas
f6220f9877 [ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker (#34878)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 08:25:26 +00:00
Andreas Karatzas
2df2bb27b0 [ROCm][CI] Removing all blocking labels from MI355 until stable infra (#34879)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 07:53:08 +00:00
Tal Nir
f75b61a9e9 [Voxtral Realtime] Fix engine crash on empty multimodal embeddings (#34862)
Signed-off-by: Tal Nir <tal@nervexneurotech.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:21:47 -08:00
Wei Zhao
7f51e93864 [Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix (#34876)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-02-18 23:20:30 -08:00
Alex Brooks
4611af1663 [Bugfix] Add Quant Config to Llava Next Projector (#34847)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
2026-02-18 23:18:23 -08:00
Manrique Vargas
ad5aa6bd9f fix(docs): fix typos in comments and docstrings (#34836)
Signed-off-by: machov <mv1742@nyu.edu>
2026-02-18 23:17:41 -08:00
Jaeyeon Kim(김재연)
9681068cf9 [Frontend] Fix reasoning_tokens for text-based parsers in Responses API (#33513)
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
2026-02-18 23:16:41 -08:00
Kevin H. Luu
b6101d384d Deprecate test-pipeline.yaml (#34864)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-02-19 02:15:27 +00:00
Woosuk Kwon
5fcb0cdd68 [Model Runner V2] Use FP32 for Gumbel Noise (#34854)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-18 17:07:37 -08:00
Woosuk Kwon
c878b43b64 [Model Runner V2] Remove unnecessary copies in PW CUDA graph capture (#34849)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-18 15:52:50 -08:00
rasmith
2b84ac669c [CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py (#34181)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-02-18 23:10:19 +00:00
zhrrr
11d3976b88 [Model Runner V2] support piecewise & mixed cudagraph (#32771)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2026-02-18 15:03:17 -08:00
Yongye Zhu
40da9625a1 [MoE Refactor] Convert mxfp4 marlin into modular kernel format (#34588)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-02-18 14:37:14 -08:00
Flora Feng
8d9babd4de Fix empty tool_call_id in Anthropic messages API tool result conversion (#34745)
Signed-off-by: <>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Flora Feng <sfeng33@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-18 14:31:59 -08:00
Aaron Hao
e99ba957ec [BUG] Fixing Weight Sync unit test (#34841)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-18 17:20:10 -05:00
Kyle Sayers
64ac1395e8 [Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-02-18 13:48:11 -08:00
Cyrus Leung
61cf087680 [Bugfix] Fix lora tests (#34834)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-18 13:22:31 -08:00
Wenlong Wang
847a57cd12 [Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-02-18 13:03:24 -08:00
rasmith
fcd6ac97ed [CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm (#34655)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-02-18 15:00:40 -05:00
Woosuk Kwon
95be2a7f22 [Model Runner V2] Minor simplification for DCP (#34786)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-18 11:04:53 -08:00
Jaden Mathias
0e60c925cf [Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported (#34455)
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com>
2026-02-18 18:54:54 +00:00
Teng Ma
d7ff22204a [Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826)
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com>
2026-02-18 18:26:24 +00:00
Isotr0py
c0bd8b13da [Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
2026-02-18 09:46:53 -08:00
Michael Goin
caeb887bf6 [Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-02-18 09:39:22 -08:00
Ilya Markov
6b3166a7c7 [CI][Bugfix] Fix multinode test script (#34820)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-02-18 11:45:10 -05:00
Robert Shaw
25e2e136ef [CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-18 11:32:44 -05:00
Robert Shaw
6874638bc4 [Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-18 07:42:36 -08:00
Burkhard Ringlein
e24663c5a9 Add unit tests for fp8 output fusion of triton_attn (#34228)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-18 06:22:49 -05:00
Nick Hill
c50e105a88 [Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-18 00:49:21 -08:00
Cyrus Leung
a766b30349 [Renderer] Deprecate code paths for old input processing (#34775)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-18 00:35:04 -08:00
Asaf Joseph Gardin
1faa8cb73c [Quantization] - Added uses_meta_device_weights to quant config (#34645)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
2026-02-17 23:43:44 -08:00
Marek Michalowski
e89a91d927 [Bugfix] fix activation in cpu_fused_moe_torch call (#34696)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
2026-02-17 23:39:46 -08:00
Michael Goin
909b147197 [Bugfix] Fix prefix creation for Qwen3.5 (#34723)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-02-17 23:39:15 -08:00
ElizaWszola
a88b3be7c4 [Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-17 23:35:04 -08:00
Nick Hill
a49ea5a58f [Model Runner V2] A bit more PP simplification (#34766)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-17 21:39:07 -08:00
Cyrus Leung
30ebe0dc3c [CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-18 12:19:11 +08:00
Andreas Karatzas
cef65f0715 [ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-18 03:59:53 +00:00
Russell Bryant
6f3b2047ab [Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: isotr0py <2037008807@qq.com>
2026-02-18 03:53:35 +00:00
Luka Govedič
02e8f26cea [torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2026-02-18 03:29:15 +00:00
Hongxia Yang
4a00a511bb [BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
2026-02-17 19:19:41 -08:00
Cyrus Leung
a0d8d944e2 [Renderer] Move MM Hash parsing into Renderer (#34711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-17 19:18:55 -08:00
Amr Mahdi
df3f537a66 [CI] Remove unused precompiled wheel args from image build (#34767)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2026-02-17 18:58:18 -08:00
Matthew Bonanni
7743152957 [Attention] Refactor check_and_update_config (#33600)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-17 17:06:54 -08:00
Wentao Ye
ab33d2a629 [Feature] Decode Context Parallel support for GPU model runner v2 (#34179)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-02-17 16:27:15 -08:00
Woosuk Kwon
be3af2d29e [Model Runner V2] Further simplification for PP (#34724)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-17 15:18:18 -08:00
Jongseok Park
c656ba3b4d [Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-02-17 23:14:30 +00:00
Matthew Bonanni
dc5fa77a4e [Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-02-17 14:01:27 -05:00
Flora Feng
1e4a084c8e [CI] Fix flaky test_parsable_context (#34717)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-02-17 18:42:52 +00:00
Richard Zou
7967e854da [BugFix] Fix sp tests (#34716)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-02-17 17:07:56 +00:00
almayne
6bd6d0c3c1 Fixed whisper CPU test that does not spawn properly. (#34324)
Signed-off-by: Anna Mayne <anna.mayne@arm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-17 06:46:23 -08:00