Yongye Zhu
|
40da9625a1
|
[MoE Refactor] Convert mxfp4 marlin into modular kernel format (#34588)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-18 14:37:14 -08:00 |
|
Flora Feng
|
8d9babd4de
|
Fix empty tool_call_id in Anthropic messages API tool result conversion (#34745)
Signed-off-by: <>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Flora Feng <sfeng33@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-18 14:31:59 -08:00 |
|
Aaron Hao
|
e99ba957ec
|
[BUG] Fixing Weight Sync unit test (#34841)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-18 17:20:10 -05:00 |
|
Kyle Sayers
|
64ac1395e8
|
[Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-02-18 13:48:11 -08:00 |
|
Cyrus Leung
|
61cf087680
|
[Bugfix] Fix lora tests (#34834)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-18 13:22:31 -08:00 |
|
Wenlong Wang
|
847a57cd12
|
[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-18 13:03:24 -08:00 |
|
rasmith
|
fcd6ac97ed
|
[CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm (#34655)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-18 15:00:40 -05:00 |
|
Woosuk Kwon
|
95be2a7f22
|
[Model Runner V2] Minor simplification for DCP (#34786)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-18 11:04:53 -08:00 |
|
Jaden Mathias
|
0e60c925cf
|
[Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported (#34455)
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com>
|
2026-02-18 18:54:54 +00:00 |
|
Teng Ma
|
d7ff22204a
|
[Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826)
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com>
|
2026-02-18 18:26:24 +00:00 |
|
Isotr0py
|
c0bd8b13da
|
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
|
2026-02-18 09:46:53 -08:00 |
|
Michael Goin
|
caeb887bf6
|
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-18 09:39:22 -08:00 |
|
Ilya Markov
|
6b3166a7c7
|
[CI][Bugfix] Fix multinode test script (#34820)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-18 11:45:10 -05:00 |
|
Robert Shaw
|
25e2e136ef
|
[CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 11:32:44 -05:00 |
|
Robert Shaw
|
6874638bc4
|
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 07:42:36 -08:00 |
|
Burkhard Ringlein
|
e24663c5a9
|
Add unit tests for fp8 output fusion of triton_attn (#34228)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-18 06:22:49 -05:00 |
|
Nick Hill
|
c50e105a88
|
[Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-18 00:49:21 -08:00 |
|
Cyrus Leung
|
a766b30349
|
[Renderer] Deprecate code paths for old input processing (#34775)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 00:35:04 -08:00 |
|
Asaf Joseph Gardin
|
1faa8cb73c
|
[Quantization] - Added uses_meta_device_weights to quant config (#34645)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-02-17 23:43:44 -08:00 |
|
Marek Michalowski
|
e89a91d927
|
[Bugfix] fix activation in cpu_fused_moe_torch call (#34696)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
|
2026-02-17 23:39:46 -08:00 |
|
Michael Goin
|
909b147197
|
[Bugfix] Fix prefix creation for Qwen3.5 (#34723)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-17 23:39:15 -08:00 |
|
ElizaWszola
|
a88b3be7c4
|
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-17 23:35:04 -08:00 |
|
Nick Hill
|
a49ea5a58f
|
[Model Runner V2] A bit more PP simplification (#34766)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 21:39:07 -08:00 |
|
Cyrus Leung
|
30ebe0dc3c
|
[CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 12:19:11 +08:00 |
|
Andreas Karatzas
|
cef65f0715
|
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-18 03:59:53 +00:00 |
|
Russell Bryant
|
6f3b2047ab
|
[Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: isotr0py <2037008807@qq.com>
|
2026-02-18 03:53:35 +00:00 |
|
Luka Govedič
|
02e8f26cea
|
[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-18 03:29:15 +00:00 |
|
Hongxia Yang
|
4a00a511bb
|
[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-02-17 19:19:41 -08:00 |
|
Cyrus Leung
|
a0d8d944e2
|
[Renderer] Move MM Hash parsing into Renderer (#34711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 19:18:55 -08:00 |
|
Amr Mahdi
|
df3f537a66
|
[CI] Remove unused precompiled wheel args from image build (#34767)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-02-17 18:58:18 -08:00 |
|
Matthew Bonanni
|
7743152957
|
[Attention] Refactor check_and_update_config (#33600)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-17 17:06:54 -08:00 |
|
Wentao Ye
|
ab33d2a629
|
[Feature] Decode Context Parallel support for GPU model runner v2 (#34179)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-17 16:27:15 -08:00 |
|
Woosuk Kwon
|
be3af2d29e
|
[Model Runner V2] Further simplification for PP (#34724)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-17 15:18:18 -08:00 |
|
Jongseok Park
|
c656ba3b4d
|
[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 23:14:30 +00:00 |
|
Matthew Bonanni
|
dc5fa77a4e
|
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-17 14:01:27 -05:00 |
|
Flora Feng
|
1e4a084c8e
|
[CI] Fix flaky test_parsable_context (#34717)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-17 18:42:52 +00:00 |
|
Richard Zou
|
7967e854da
|
[BugFix] Fix sp tests (#34716)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-17 17:07:56 +00:00 |
|
almayne
|
6bd6d0c3c1
|
Fixed whisper CPU test that does not spawn properly. (#34324)
Signed-off-by: Anna Mayne <anna.mayne@arm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-17 06:46:23 -08:00 |
|
Nicolò Lucchesi
|
8e962fef5f
|
[CI][Nixl] Add CrossLayer KV layout tests (#34615)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-17 21:35:40 +08:00 |
|
Cyrus Leung
|
574fe75245
|
[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 05:29:01 -08:00 |
|
junuxyz
|
c61a98f529
|
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-17 12:22:56 +00:00 |
|
Harry Mellor
|
28bffe9466
|
Fix docs build warning (#34686)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-17 02:31:40 -08:00 |
|
ChenqianCao
|
ad65177a19
|
[Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo (#32922)
Signed-off-by: ChenqianCao <39755070+ChenqianCao@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-17 10:06:53 +00:00 |
|
Tim Dettmers
|
d44a5b6c47
|
Remove dead bitsandbytes CxB code from 8-bit inference path (#34633)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-17 01:49:14 -08:00 |
|
Jiangyun Zhu
|
1d65283e95
|
Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" (#34683)
|
2026-02-17 01:29:27 -08:00 |
|
kourosh hakhamaneshi
|
c464b57374
|
[Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-17 01:08:42 -08:00 |
|
Amr Mahdi
|
c5c38e152a
|
[CI] Fix bake config artifact path for AMI rebuild pipeline (#34656)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-02-17 06:39:44 +00:00 |
|
Woosuk Kwon
|
d00df624f3
|
[Model Runner V2] Minor refactoring for penalties (#34662)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 21:43:00 -08:00 |
|
Woosuk Kwon
|
9752da9d9c
|
[Model Runner V2] Minor simplification for BadWordsState (#34669)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 21:27:24 -08:00 |
|
Woosuk Kwon
|
04925b2202
|
[Model Runner V2] Minor cleanup for PP (#34666)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 19:15:31 -08:00 |
|