Lukas Geiger
|
4f6eed3bd4
|
[Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-04-01 01:18:22 -07:00 |
|
Li, Jiang
|
36d7f19897
|
[CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-01 05:42:27 +00:00 |
|
Jeffrey Wang
|
2d725b89c5
|
[Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 05:31:20 +00:00 |
|
Augusto Yao
|
ef53395e2c
|
[bugfix] do not add extra linebreak for score/rerank with chat template (#38617)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-04-01 04:50:07 +00:00 |
|
Lucas Wilkinson
|
eb47454987
|
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-01 00:15:53 -04:00 |
|
Matthew Bonanni
|
116f4be405
|
[1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-04-01 04:08:01 +00:00 |
|
Wentao Ye
|
7b01d97a22
|
[Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-01 03:54:58 +00:00 |
|
HarshRathva
|
17b72fd1c8
|
Fix priority preemption regression test in scheduler (#37051)
Signed-off-by: HarshRathva <harshrathvaai@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-01 06:36:12 +03:00 |
|
Samu Tamminen
|
c49497726b
|
[ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914)
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com>
|
2026-04-01 03:30:19 +00:00 |
|
Ben Browning
|
cb0b443274
|
[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-04-01 03:00:31 +00:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Elvir Crnčević
|
0fab52f0aa
|
Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-31 19:14:59 -07:00 |
|
Yifan Qiao
|
91e4521f9f
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-03-31 17:58:37 -07:00 |
|
Stig-Arne Grönroos
|
31a719bcd3
|
[ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887)
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
Signed-off-by: Stig-Arne Grönroos <sgronroo@amd.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 19:22:23 -04:00 |
|
Vedant V Jhaveri
|
2e56975657
|
Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-31 16:02:11 -07:00 |
|
Chang Su
|
36f1dc19ae
|
feat(grpc): add periodic stats logging and servicer log forwarding (#38333)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 15:50:07 -07:00 |
|
Asaf Gardin
|
3dc01ef352
|
[Quantization] Consolidate dummy format logic into DummyModelLoader (#38637)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-31 22:20:45 +00:00 |
|
Yanan Cao
|
cc671cb110
|
[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
|
2026-03-31 17:06:42 -04:00 |
|
Wentao Ye
|
856589ed9a
|
[Refactor] Remove dead code in kv connector and model runner (#38383)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-31 17:05:23 -04:00 |
|
czhu-cohere
|
517b769b58
|
[Perf] Fix DBO overlap: capture DeepEP event before yield (#38451)
Signed-off-by: root <conway.zhu@cohere.com>
|
2026-03-31 20:38:59 +00:00 |
|
yzong-rh
|
d9b90a07ac
|
[MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-31 15:43:33 -04:00 |
|
Olya Kozlova
|
598190aac3
|
[fix] Remove trtllm ragged mla prefills (#36540)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2026-03-31 12:30:27 -07:00 |
|
Xu Jinyang
|
b779eb3363
|
[Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343)
Signed-off-by: AuYang <459461160@qq.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2026-03-31 23:03:24 +04:00 |
|
BadrBasowid
|
077a9a8e37
|
[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-31 14:15:50 -04:00 |
|
Run Yu
|
07edd551cc
|
[CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766)
Signed-off-by: Run Yu <yurun00@gmail.com>
|
2026-03-31 18:05:14 +00:00 |
|
mikaylagawarecki
|
7c080dd3c5
|
[4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
|
2026-03-31 10:21:13 -07:00 |
|
Yi Liu
|
0dd25a44ea
|
[Quantization][Autoround][XPU] Add W4A16 Support (#37986)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-03-31 16:48:24 +00:00 |
|
SandishKumarHN
|
3896e021a0
|
[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010)
Signed-off-by: SandishKumarHN <sandish@fb.com>
|
2026-03-31 12:22:26 -04:00 |
|
zhang-prog
|
b6e636c12c
|
[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
v0.18.2rc0
|
2026-03-31 15:50:41 +00:00 |
|
Jingu Kang
|
f1ff50c86c
|
[Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501)
Signed-off-by: Jingu Kang <jg.k@navercorp.com>
|
2026-03-31 17:35:51 +02:00 |
|
Matthew Bonanni
|
757068dc65
|
[Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
|
2026-03-31 11:08:54 -04:00 |
|
Nicolò Lucchesi
|
7337ff7f03
|
[Docs] PD with Nixl compat matrix (#38628)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 15:01:21 +00:00 |
|
Kyle Sayers
|
5869f69c5f
|
[Online Quant] [QeRL] Minor code cleanup (#38574)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-31 14:56:43 +00:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
wenjun liu
|
e8057c00bc
|
[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-31 22:23:18 +08:00 |
|
Nicolò Lucchesi
|
7430389669
|
[Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 09:42:37 -04:00 |
|
ElizaWszola
|
202f147cf2
|
Fix MLA runs when use_inductor_graph_partition=True (#38631)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2026-03-31 13:37:43 +00:00 |
|
Jiangyun Zhu
|
ea7bfde6e4
|
[CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-31 13:20:08 +00:00 |
|
sihao_li
|
d71a15041f
|
[XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 12:49:43 +00:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
liuzhenwei
|
0c63739135
|
[EPD] update EPD script arguments (#36742)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-03-31 12:02:09 +00:00 |
|
wang.yuqi
|
719735d6c5
|
[CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-31 10:54:54 +00:00 |
|
Maosheng Liao
|
aae3e688f8
|
Fix document of torchrun_example.py (#31113)
|
2026-03-31 10:54:23 +00:00 |
|
Matthew Bonanni
|
7d65463528
|
[WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 12:30:25 +02:00 |
|
Mateusz Sokół
|
8278825b57
|
DOC: TPU mention fix (#38129)
Signed-off-by: Mateusz Sokół <mat646@gmail.com>
|
2026-03-31 03:27:56 -07:00 |
|
Chang Su
|
acf7292bf2
|
[Misc] Move --grpc CLI argument into make_arg_parser (#38570)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 03:24:05 -07:00 |
|
Chauncey
|
ce884756f0
|
[Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-31 08:45:57 +00:00 |
|
wang.yuqi
|
d9d21eb8e3
|
[Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-31 07:52:00 +00:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Kevin H. Luu
|
42318c840b
|
[ci] Remove benchmarks job (#38611)
|
2026-03-31 06:46:21 +00:00 |
|