JartX
|
2ce3d0ce36
|
[Feature] KV cache per-token-head INT8/FP8 quantization (#38378)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: yangyang4991 <yangyang4991@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 08:13:26 -04:00 |
|
Jiangyun Zhu
|
4eefbf9609
|
[Perf] fuse kernels in gdn (#37813)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-04-02 11:52:18 +00:00 |
|
vllmellm
|
551b3fb39f
|
[ROCm] Enable VLLM triton FP8 moe for gfx1201, tuned for Qwen3-30B-A3B-FP8 tp=2 and Qwen/Qwen3.5-35B-A3B-FP8 tp=2 (#38086)
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-04-02 08:13:42 +00:00 |
|
Li, Jiang
|
c6f722b93e
|
[CPU] Support gelu act in cpu_fused_moe (#38770)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-02 14:14:32 +08:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Yanan Cao
|
73f48ce559
|
[Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam (#38743)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
|
2026-04-01 21:30:31 -07:00 |
|
Gregory Shtrasberg
|
3aab680e3e
|
[ROCm][Bugfix] Fix ROCm runtime failure due to missing symbol (#38750)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
|
2026-04-01 21:30:11 -07:00 |
|
Sergey Zinchenko
|
5a2d420c17
|
[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545)
Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com>
|
2026-04-01 21:14:49 -07:00 |
|
Benjamin Chislett
|
5f96f9aff1
|
[Perf] DSV3.2 Indexer Fused Weights Projection (#38684)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-04-02 03:34:49 +00:00 |
|
Luka Govedič
|
694449050f
|
Fix multiline-format string for python 3.10 (#38739)
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
|
2026-04-02 03:19:35 +00:00 |
|
Nick Hill
|
6241521dd2
|
[BugFix] Fix precommit breakage due to conflicting in-flight merges (#38759)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-01 15:35:06 -07:00 |
|
Kevin H. Luu
|
1785dc5501
|
Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)" (#38751)
|
2026-04-02 06:34:28 +08:00 |
|
Chang Su
|
54500546ac
|
[Bugfix] Preserve original ImportError in gRPC server entrypoint (#38673)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-04-01 22:16:44 +00:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
yzong-rh
|
cb268e4e55
|
[Refactor] Simplify FutureWrapper in MultiprocExecutor (#38644)
Signed-off-by: Yifan <yzong@redhat.com>
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-04-01 21:28:26 +00:00 |
|
Stefano Castagnetta
|
6183cae1bd
|
[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
|
2026-04-01 12:08:40 -07:00 |
|
Monishver
|
c09ad767cd
|
Feature/silu block quant fusion v1 (#32996)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
|
2026-04-01 18:50:43 +00:00 |
|
Wentao Ye
|
c9a9db0e02
|
[Compile] Fix nvfp4 compile warning (#38573)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-01 18:28:57 +00:00 |
|
Chauncey
|
cbe7d18096
|
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-04-01 09:56:45 -07:00 |
|
Michael Goin
|
db5d0719e1
|
[Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-04-01 09:41:42 -07:00 |
|
yzong-rh
|
dc0428ebb8
|
[NIXL][BUG] Fix Triton heterogeneous TP (#37940)
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-04-01 17:23:15 +02:00 |
|
Jesus Talavera
|
148c2072ec
|
Add ibm-granite/granite-vision-3.3-2b to supported models documentation (#38714)
Signed-off-by: Jesus Talavera <jesus.talavera@ibm.com>
|
2026-04-01 08:22:25 -07:00 |
|
majianhan
|
2f5c3c1ec0
|
[Misc] Fix docstring typo: buildin -> builtin (#38722)
Co-authored-by: majianhan <majianhan@kylinos.cn>
|
2026-04-01 07:39:46 -07:00 |
|
Fynn Schmitt-Ulms
|
fa246d5231
|
Fix shape comment in extract_hidden_states example (#38723)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-04-01 07:29:33 -07:00 |
|
bnellnm
|
7cf56a59a2
|
[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-04-01 09:44:08 -04:00 |
|
Elvir Crnčević
|
5e30e9b9a9
|
[Bugfix] Revert "Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding" (#38359)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-04-01 09:11:10 -04:00 |
|
손세정
|
582340f273
|
[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)
Signed-off-by: AAISSJ <maze0717@g.skku.edu>
Signed-off-by: <>
Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local>
|
2026-04-01 20:22:29 +08:00 |
|
yjz
|
992368522f
|
[KVTransfer] Fix TpKVTopology.is_kv_replicated equality case (#38179)
Signed-off-by: JianDan0212 <zhangyj0212@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-04-01 12:41:49 +02:00 |
|
Juan Pérez de Algaba
|
58ee614221
|
(security) Enforce frame limit in VideoMediaIO (#38636)
Signed-off-by: jperezde <jperezde@redhat.com>
|
2026-04-01 10:23:45 +00:00 |
|
Harry Mellor
|
f9f6a9097a
|
Add verified label to trigger pre-commit (#38708)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-04-01 02:31:02 -07:00 |
|
Zhanda Zhu
|
c75a313824
|
[Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
|
2026-04-01 01:52:02 -07:00 |
|
Lukas Geiger
|
4f6eed3bd4
|
[Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-04-01 01:18:22 -07:00 |
|
Li, Jiang
|
36d7f19897
|
[CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-01 05:42:27 +00:00 |
|
Jeffrey Wang
|
2d725b89c5
|
[Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 05:31:20 +00:00 |
|
Augusto Yao
|
ef53395e2c
|
[bugfix] do not add extra linebreak for score/rerank with chat template (#38617)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-04-01 04:50:07 +00:00 |
|
Lucas Wilkinson
|
eb47454987
|
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-01 00:15:53 -04:00 |
|
Matthew Bonanni
|
116f4be405
|
[1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-04-01 04:08:01 +00:00 |
|
Wentao Ye
|
7b01d97a22
|
[Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-01 03:54:58 +00:00 |
|
HarshRathva
|
17b72fd1c8
|
Fix priority preemption regression test in scheduler (#37051)
Signed-off-by: HarshRathva <harshrathvaai@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-01 06:36:12 +03:00 |
|
Samu Tamminen
|
c49497726b
|
[ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914)
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com>
|
2026-04-01 03:30:19 +00:00 |
|
Ben Browning
|
cb0b443274
|
[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-04-01 03:00:31 +00:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Elvir Crnčević
|
0fab52f0aa
|
Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-31 19:14:59 -07:00 |
|
Yifan Qiao
|
91e4521f9f
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-03-31 17:58:37 -07:00 |
|
Stig-Arne Grönroos
|
31a719bcd3
|
[ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887)
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
Signed-off-by: Stig-Arne Grönroos <sgronroo@amd.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 19:22:23 -04:00 |
|
Vedant V Jhaveri
|
2e56975657
|
Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-31 16:02:11 -07:00 |
|
Chang Su
|
36f1dc19ae
|
feat(grpc): add periodic stats logging and servicer log forwarding (#38333)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 15:50:07 -07:00 |
|
Asaf Gardin
|
3dc01ef352
|
[Quantization] Consolidate dummy format logic into DummyModelLoader (#38637)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-31 22:20:45 +00:00 |
|
Yanan Cao
|
cc671cb110
|
[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
|
2026-03-31 17:06:42 -04:00 |
|
Wentao Ye
|
856589ed9a
|
[Refactor] Remove dead code in kv connector and model runner (#38383)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-31 17:05:23 -04:00 |
|