Commit Graph

15615 Commits

Author SHA1 Message Date
Kevin H. Luu
f0d3ad9f3e [ci] Remove soft fail for AMD image build job (#38941)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-04-03 20:42:33 +00:00
Divin Honnappa
121ea5a21f Removed GPU state confirmation and cleanup steps. (#38238)
Signed-off-by: Divin Honnappa <divin.honnappa@amd.com>
2026-04-03 13:11:08 -07:00
Jeffrey Wang
ab79863e6c Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-03 20:00:08 +00:00
Nick Hill
5f1de2b14b [Model Runner V2] Add config validation for not-yet-supported features (#38758)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-04-03 12:08:08 -07:00
yzong-rh
a5a623d961 [Bugfix] Re-enable Renormalize routing for TRT-LLM MoE experts (#38859)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-04-04 01:48:17 +08:00
Xiaoshuang Wang
f8c3af2d85 [vLLM IR] add import_ir_kernels() to support OOT platforms (#38807)
Signed-off-by: Icey <1790571317@qq.com>
2026-04-03 17:25:19 +00:00
danisereb
50cd5674b3 Fix invalid logprobs with MTP enabled and sync scheduling (#38711)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-04-03 12:24:37 -04:00
Vasiliy Kuznetsov
7b1a7423be [Frontend] new online quantization frontend (#38138)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
2026-04-03 11:58:39 -04:00
Nicolò Lucchesi
97f92c6b47 [KVConnector] Skip register_kv_caches on profiling (#38558)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-04-03 15:40:16 +00:00
Yusuf Mohammad
46f02e00f2 [Bugfix] Fix AWQ models batch invariance issues (#38670)
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
2026-04-03 14:54:15 +00:00
Qiming Zhang
6b4872240f [XPU] bump up xpu-kernel v0.1.5, transpose moe weights (#38342)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: Qiming Zhang <qiming1.zhang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-04-03 14:10:02 +00:00
Necofish
580090db6b [Kernel] Add swapAB support for SM120 CUTLASS blockwise FP8 GEMM (#38325) 2026-04-03 15:49:59 +02:00
Artem Perevedentsev
cb10b7e80b [GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill (#38361)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
2026-04-03 13:38:02 +00:00
Mieszko Dziadowiec
bf8b022e60 [Intel][Triton] Support round_int8 for Intel backend (#38825)
Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-04-03 20:47:35 +08:00
xiangdong
40ee64c00e [XPU][CI] Skip test_topp_only and test_topk_and_topp cases on Intel GPU in CI (#38904)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-04-03 20:44:52 +08:00
wufann
1b117cb0ac [ROCm] Fix aiter persistent mode mla with q/o nhead<16 for kimi-k2.5 tp8 (#38615)
Signed-off-by: wufann <36477220+wufann@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-03 03:54:00 -07:00
Anton Ivanov
abebd9323d [CPU] Replace OMP initialization (#36487)
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
2026-04-03 18:42:43 +08:00
Hyeonki Hong
25f2b55319 [Frontend] feat: add streaming support for token generation endpoint (#37171)
Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>
2026-04-03 10:20:32 +00:00
xiangdong
cb4ff07f8b [XPU][CI] Skip test_topk_only cases on Intel GPU in CI (#38899)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-04-03 09:50:41 +00:00
Gregory Shtrasberg
a7d79fa133 [ROCm][CI/Build] Fix the pytest hook to properly print out the summary (#38585)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2026-04-03 17:24:26 +08:00
Netanel Haber
fa9e68022d Fix Nano Nemotron VL regressions (#38655)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2026-04-03 15:22:06 +08:00
Isotr0py
5506435419 [Misc] Clean up Gemma4 implementation (#38872)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
v0.19.1rc0
2026-04-03 05:47:02 +00:00
Yifan Qiao
311c981647 [MRV2][KVConnector] Fix missing build_connector_worker_meta (#38698)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-04-03 08:42:52 +03:00
Li, Jiang
21d7ecc5b0 [CI/Build] Add audio deps in Dockerfile.cpu (#38876)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-03 05:05:14 +00:00
Aaron Hao
4729b90838 [Bug] Add e_score_correction_bias to SKIP_TENSORS (#38746)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-04-02 21:15:05 -07:00
shunting314
8b141ed8c3 full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
2026-04-02 21:15:01 -07:00
Varun Sundar Rabindranath
2ad7c0335f [Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2026-04-02 21:14:57 -07:00
Bowen Bao
201d2ea5bf [CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI (#38664)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
2026-04-03 04:05:45 +00:00
Bowen Bao
103f0de565 [ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-04-03 03:29:57 +00:00
wliao2
32e0c0bfa2 refactor hard coded device string in test files under tests/v1 and tests/lora (#37566)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
2026-04-03 11:21:47 +08:00
Itay Etelis
4a06e1246e [Perf] Batch KV cache swap copies via cuMemcpyBatchAsync (#38460)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-04-03 03:13:23 +00:00
Carl Y
3bc2734dd0 [Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
2026-04-03 01:47:04 +00:00
Carl Y
1f5ec2889c [mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 21:16:11 -04:00
Yan Ma
ee3cf45739 [XPU] Initial support for GDN attention on Qwen3-next/Qwen3.5 (#33657)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-04-03 08:59:11 +08:00
Matthew Bonanni
05e68e1f81 [CI] Fix test_nixl_connector (#38838) 2026-04-02 17:52:13 -07:00
Vadim Gimpelson
771913e4a0 [Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 (#38832)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-04-03 04:45:57 +04:00
1096125073
71a9125c67 [New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
2026-04-03 08:26:22 +08:00
Nicolò Lucchesi
66e86f1dbd [Kernel] Mamba support different layout for Conv state (#37416) 2026-04-03 01:50:09 +02:00
Michael
bb39382b2b [Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter (#38847)
Signed-off-by: Michael Hospedales <hospedales@me.com>
2026-04-02 14:35:19 -07:00
zhanqiuhu
7b743ba953 [CI] Fix: pass string cache_dtype in test_register_kv_caches (#38836) 2026-04-02 19:42:09 +00:00
Stefano Castagnetta
188defbd0b [CI] Add flashinfer.py to attention test source deps (#38792)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-04-02 19:24:29 +00:00
Luciano Martins
08ed2b9688 feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2026-04-02 11:13:28 -07:00
Yanan Cao
ecd5443dbc Bump helion dependency from 0.3.2 to 0.3.3 (#38062)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 10:59:33 -07:00
Stefano Castagnetta
58262dec6e [Bugfix] Fix test mocks after SM100 restriction in #38730 (#38791)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-02 13:12:58 -04:00
Lucas Wilkinson
cb3935a8fc [FA4] Update flash-attention to latest upstream FA4 (#38690)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-04-02 17:02:37 +00:00
Bowen Bao
82a006beeb [CI][ROCm] Add gpt-oss w4a8 in CI (#38292)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
2026-04-03 00:06:01 +08:00
wang.yuqi
a9b4f07ba2 [Frontend] Re-enable running MaxSim on GPU (#38620)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-03 00:03:13 +08:00
Koushik Dutta
d9408ffba3 Triton MLA perf fixes (#33529)
Signed-off-by: Koushik Dutta <koushd@gmail.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-04-02 09:40:01 -04:00
Yusuf Mohammad
16a65e4173 [Bugfix] Enable batch-invariant Triton matmul on all Ampere GPUs (SM 8x) (#38427)
Signed-off-by: yusuf <yusufmohammad@live.com>
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: Yusuf Mohammad <79484377+YM2132@users.noreply.github.com>
Signed-off-by: <>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
2026-04-02 09:29:58 -04:00
bsliu
c0817e4d39 [Model] Add support for Cheers multimodal model (#38788)
Signed-off-by: bsliu <1187291748@qq.com>
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>
2026-04-02 21:01:40 +08:00