bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
liuchenbing2026
|
f6983f01de
|
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
Signed-off-by: liuchenbing <chenliumail@163.com>
Signed-off-by: liucb <liuchengbao_work@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>
|
2026-04-05 19:50:18 -07:00 |
|
Micah Williamson
|
9570654c6d
|
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-06 09:42:02 +08:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
Greg Pereira
|
f53fa26e05
|
[Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters (#38992)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 17:11:18 +00:00 |
|
Aaron Batilo
|
9a528260ef
|
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987)
Signed-off-by: Aaron Batilo <abatilo@coreweave.com>
|
2026-04-05 02:41:54 -07:00 |
|
Jeffrey Wang
|
ab79863e6c
|
Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-03 20:00:08 +00:00 |
|
Vasiliy Kuznetsov
|
7b1a7423be
|
[Frontend] new online quantization frontend (#38138)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
|
2026-04-03 11:58:39 -04:00 |
|
Yusuf Mohammad
|
46f02e00f2
|
[Bugfix] Fix AWQ models batch invariance issues (#38670)
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
|
2026-04-03 14:54:15 +00:00 |
|
Hyeonki Hong
|
25f2b55319
|
[Frontend] feat: add streaming support for token generation endpoint (#37171)
Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>
|
2026-04-03 10:20:32 +00:00 |
|
Netanel Haber
|
fa9e68022d
|
Fix Nano Nemotron VL regressions (#38655)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-03 15:22:06 +08:00 |
|
shunting314
|
8b141ed8c3
|
full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
|
2026-04-02 21:15:01 -07:00 |
|
Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Bowen Bao
|
201d2ea5bf
|
[CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI (#38664)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2026-04-03 04:05:45 +00:00 |
|
Bowen Bao
|
103f0de565
|
[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-03 03:29:57 +00:00 |
|
wliao2
|
32e0c0bfa2
|
refactor hard coded device string in test files under tests/v1 and tests/lora (#37566)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
|
2026-04-03 11:21:47 +08:00 |
|
Carl Y
|
3bc2734dd0
|
[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-03 01:47:04 +00:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|
1096125073
|
71a9125c67
|
[New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
|
2026-04-03 08:26:22 +08:00 |
|
Nicolò Lucchesi
|
66e86f1dbd
|
[Kernel] Mamba support different layout for Conv state (#37416)
|
2026-04-03 01:50:09 +02:00 |
|
zhanqiuhu
|
7b743ba953
|
[CI] Fix: pass string cache_dtype in test_register_kv_caches (#38836)
|
2026-04-02 19:42:09 +00:00 |
|
Luciano Martins
|
08ed2b9688
|
feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 11:13:28 -07:00 |
|
Stefano Castagnetta
|
58262dec6e
|
[Bugfix] Fix test mocks after SM100 restriction in #38730 (#38791)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-04-02 13:12:58 -04:00 |
|
Bowen Bao
|
82a006beeb
|
[CI][ROCm] Add gpt-oss w4a8 in CI (#38292)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2026-04-03 00:06:01 +08:00 |
|
wang.yuqi
|
a9b4f07ba2
|
[Frontend] Re-enable running MaxSim on GPU (#38620)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-03 00:03:13 +08:00 |
|
bsliu
|
c0817e4d39
|
[Model] Add support for Cheers multimodal model (#38788)
Signed-off-by: bsliu <1187291748@qq.com>
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>
|
2026-04-02 21:01:40 +08:00 |
|
JartX
|
2ce3d0ce36
|
[Feature] KV cache per-token-head INT8/FP8 quantization (#38378)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: yangyang4991 <yangyang4991@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 08:13:26 -04:00 |
|
Jiangyun Zhu
|
4eefbf9609
|
[Perf] fuse kernels in gdn (#37813)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-04-02 11:52:18 +00:00 |
|
Li, Jiang
|
c6f722b93e
|
[CPU] Support gelu act in cpu_fused_moe (#38770)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-02 14:14:32 +08:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Sergey Zinchenko
|
5a2d420c17
|
[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545)
Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com>
|
2026-04-01 21:14:49 -07:00 |
|
Kevin H. Luu
|
1785dc5501
|
Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)" (#38751)
|
2026-04-02 06:34:28 +08:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
Monishver
|
c09ad767cd
|
Feature/silu block quant fusion v1 (#32996)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
|
2026-04-01 18:50:43 +00:00 |
|
Chauncey
|
cbe7d18096
|
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-04-01 09:56:45 -07:00 |
|
Michael Goin
|
db5d0719e1
|
[Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-04-01 09:41:42 -07:00 |
|
yzong-rh
|
dc0428ebb8
|
[NIXL][BUG] Fix Triton heterogeneous TP (#37940)
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-04-01 17:23:15 +02:00 |
|
bnellnm
|
7cf56a59a2
|
[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-04-01 09:44:08 -04:00 |
|
손세정
|
582340f273
|
[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)
Signed-off-by: AAISSJ <maze0717@g.skku.edu>
Signed-off-by: <>
Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local>
|
2026-04-01 20:22:29 +08:00 |
|
Juan Pérez de Algaba
|
58ee614221
|
(security) Enforce frame limit in VideoMediaIO (#38636)
Signed-off-by: jperezde <jperezde@redhat.com>
|
2026-04-01 10:23:45 +00:00 |
|
Zhanda Zhu
|
c75a313824
|
[Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
|
2026-04-01 01:52:02 -07:00 |
|
Lukas Geiger
|
4f6eed3bd4
|
[Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-04-01 01:18:22 -07:00 |
|
Li, Jiang
|
36d7f19897
|
[CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-01 05:42:27 +00:00 |
|
Augusto Yao
|
ef53395e2c
|
[bugfix] do not add extra linebreak for score/rerank with chat template (#38617)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-04-01 04:50:07 +00:00 |
|
Lucas Wilkinson
|
eb47454987
|
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-01 00:15:53 -04:00 |
|
HarshRathva
|
17b72fd1c8
|
Fix priority preemption regression test in scheduler (#37051)
Signed-off-by: HarshRathva <harshrathvaai@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-01 06:36:12 +03:00 |
|
Ben Browning
|
cb0b443274
|
[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-04-01 03:00:31 +00:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Yifan Qiao
|
91e4521f9f
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-03-31 17:58:37 -07:00 |
|
Vedant V Jhaveri
|
2e56975657
|
Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-31 16:02:11 -07:00 |
|