Wentao Ye
|
cab4064cd5
|
[Bug] Fix workspace manager _current_workspaces size (#38853)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-04 01:29:45 +00:00 |
|
elenalil-aws
|
81994e1d0e
|
[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… (#38927)
Signed-off-by: elenalil-aws <elenalil@amazon.com>
|
2026-04-03 23:30:09 +00:00 |
|
Nick Hill
|
5f1de2b14b
|
[Model Runner V2] Add config validation for not-yet-supported features (#38758)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-03 12:08:08 -07:00 |
|
yzong-rh
|
a5a623d961
|
[Bugfix] Re-enable Renormalize routing for TRT-LLM MoE experts (#38859)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-04-04 01:48:17 +08:00 |
|
Xiaoshuang Wang
|
f8c3af2d85
|
[vLLM IR] add import_ir_kernels() to support OOT platforms (#38807)
Signed-off-by: Icey <1790571317@qq.com>
|
2026-04-03 17:25:19 +00:00 |
|
danisereb
|
50cd5674b3
|
Fix invalid logprobs with MTP enabled and sync scheduling (#38711)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-04-03 12:24:37 -04:00 |
|
Vasiliy Kuznetsov
|
7b1a7423be
|
[Frontend] new online quantization frontend (#38138)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
|
2026-04-03 11:58:39 -04:00 |
|
Nicolò Lucchesi
|
97f92c6b47
|
[KVConnector] Skip register_kv_caches on profiling (#38558)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-04-03 15:40:16 +00:00 |
|
Yusuf Mohammad
|
46f02e00f2
|
[Bugfix] Fix AWQ models batch invariance issues (#38670)
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
|
2026-04-03 14:54:15 +00:00 |
|
Qiming Zhang
|
6b4872240f
|
[XPU] bump up xpu-kernel v0.1.5, transpose moe weights (#38342)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: Qiming Zhang <qiming1.zhang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-03 14:10:02 +00:00 |
|
Artem Perevedentsev
|
cb10b7e80b
|
[GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill (#38361)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
|
2026-04-03 13:38:02 +00:00 |
|
Mieszko Dziadowiec
|
bf8b022e60
|
[Intel][Triton] Support round_int8 for Intel backend (#38825)
Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-03 20:47:35 +08:00 |
|
wufann
|
1b117cb0ac
|
[ROCm] Fix aiter persistent mode mla with q/o nhead<16 for kimi-k2.5 tp8 (#38615)
Signed-off-by: wufann <36477220+wufann@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-03 03:54:00 -07:00 |
|
Anton Ivanov
|
abebd9323d
|
[CPU] Replace OMP initialization (#36487)
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
|
2026-04-03 18:42:43 +08:00 |
|
Hyeonki Hong
|
25f2b55319
|
[Frontend] feat: add streaming support for token generation endpoint (#37171)
Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>
|
2026-04-03 10:20:32 +00:00 |
|
Netanel Haber
|
fa9e68022d
|
Fix Nano Nemotron VL regressions (#38655)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-03 15:22:06 +08:00 |
|
Isotr0py
|
5506435419
|
[Misc] Clean up Gemma4 implementation (#38872)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-04-03 05:47:02 +00:00 |
|
Yifan Qiao
|
311c981647
|
[MRV2][KVConnector] Fix missing build_connector_worker_meta (#38698)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-04-03 08:42:52 +03:00 |
|
Aaron Hao
|
4729b90838
|
[Bug] Add e_score_correction_bias to SKIP_TENSORS (#38746)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-04-02 21:15:05 -07:00 |
|
shunting314
|
8b141ed8c3
|
full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
|
2026-04-02 21:15:01 -07:00 |
|
Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Bowen Bao
|
103f0de565
|
[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-03 03:29:57 +00:00 |
|
Itay Etelis
|
4a06e1246e
|
[Perf] Batch KV cache swap copies via cuMemcpyBatchAsync (#38460)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-03 03:13:23 +00:00 |
|
Carl Y
|
3bc2734dd0
|
[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-03 01:47:04 +00:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|
Yan Ma
|
ee3cf45739
|
[XPU] Initial support for GDN attention on Qwen3-next/Qwen3.5 (#33657)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-03 08:59:11 +08:00 |
|
Vadim Gimpelson
|
771913e4a0
|
[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 (#38832)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-04-03 04:45:57 +04:00 |
|
1096125073
|
71a9125c67
|
[New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
|
2026-04-03 08:26:22 +08:00 |
|
Nicolò Lucchesi
|
66e86f1dbd
|
[Kernel] Mamba support different layout for Conv state (#37416)
|
2026-04-03 01:50:09 +02:00 |
|
Michael
|
bb39382b2b
|
[Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter (#38847)
Signed-off-by: Michael Hospedales <hospedales@me.com>
|
2026-04-02 14:35:19 -07:00 |
|
Luciano Martins
|
08ed2b9688
|
feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 11:13:28 -07:00 |
|
wang.yuqi
|
a9b4f07ba2
|
[Frontend] Re-enable running MaxSim on GPU (#38620)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-03 00:03:13 +08:00 |
|
Koushik Dutta
|
d9408ffba3
|
Triton MLA perf fixes (#33529)
Signed-off-by: Koushik Dutta <koushd@gmail.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-04-02 09:40:01 -04:00 |
|
Yusuf Mohammad
|
16a65e4173
|
[Bugfix] Enable batch-invariant Triton matmul on all Ampere GPUs (SM 8x) (#38427)
Signed-off-by: yusuf <yusufmohammad@live.com>
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: Yusuf Mohammad <79484377+YM2132@users.noreply.github.com>
Signed-off-by: <>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
|
2026-04-02 09:29:58 -04:00 |
|
bsliu
|
c0817e4d39
|
[Model] Add support for Cheers multimodal model (#38788)
Signed-off-by: bsliu <1187291748@qq.com>
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>
|
2026-04-02 21:01:40 +08:00 |
|
Harry Mellor
|
dfe5e31689
|
Don't compile vision encoder for Transformers backend (#30518)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-04-02 12:42:29 +00:00 |
|
JartX
|
2ce3d0ce36
|
[Feature] KV cache per-token-head INT8/FP8 quantization (#38378)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: yangyang4991 <yangyang4991@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 08:13:26 -04:00 |
|
Jiangyun Zhu
|
4eefbf9609
|
[Perf] fuse kernels in gdn (#37813)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-04-02 11:52:18 +00:00 |
|
vllmellm
|
551b3fb39f
|
[ROCm] Enable VLLM triton FP8 moe for gfx1201, tuned for Qwen3-30B-A3B-FP8 tp=2 and Qwen/Qwen3.5-35B-A3B-FP8 tp=2 (#38086)
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-04-02 08:13:42 +00:00 |
|
Li, Jiang
|
c6f722b93e
|
[CPU] Support gelu act in cpu_fused_moe (#38770)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-02 14:14:32 +08:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Yanan Cao
|
73f48ce559
|
[Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam (#38743)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
|
2026-04-01 21:30:31 -07:00 |
|
Sergey Zinchenko
|
5a2d420c17
|
[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545)
Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com>
|
2026-04-01 21:14:49 -07:00 |
|
Benjamin Chislett
|
5f96f9aff1
|
[Perf] DSV3.2 Indexer Fused Weights Projection (#38684)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-04-02 03:34:49 +00:00 |
|
Luka Govedič
|
694449050f
|
Fix multiline-format string for python 3.10 (#38739)
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
|
2026-04-02 03:19:35 +00:00 |
|
Nick Hill
|
6241521dd2
|
[BugFix] Fix precommit breakage due to conflicting in-flight merges (#38759)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-01 15:35:06 -07:00 |
|
Kevin H. Luu
|
1785dc5501
|
Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)" (#38751)
|
2026-04-02 06:34:28 +08:00 |
|
Chang Su
|
54500546ac
|
[Bugfix] Preserve original ImportError in gRPC server entrypoint (#38673)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-04-01 22:16:44 +00:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
yzong-rh
|
cb268e4e55
|
[Refactor] Simplify FutureWrapper in MultiprocExecutor (#38644)
Signed-off-by: Yifan <yzong@redhat.com>
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-04-01 21:28:26 +00:00 |
|