Commit Graph

15433 Commits

Author SHA1 Message Date
Lukas Geiger
4f6eed3bd4 [Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2026-04-01 01:18:22 -07:00
Li, Jiang
36d7f19897 [CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-01 05:42:27 +00:00
Jeffrey Wang
2d725b89c5 [Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-01 05:31:20 +00:00
Augusto Yao
ef53395e2c [bugfix] do not add extra linebreak for score/rerank with chat template (#38617)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-04-01 04:50:07 +00:00
Lucas Wilkinson
eb47454987 [Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-04-01 00:15:53 -04:00
Matthew Bonanni
116f4be405 [1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-04-01 04:08:01 +00:00
Wentao Ye
7b01d97a22 [Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-04-01 03:54:58 +00:00
HarshRathva
17b72fd1c8 Fix priority preemption regression test in scheduler (#37051)
Signed-off-by: HarshRathva <harshrathvaai@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-04-01 06:36:12 +03:00
Samu Tamminen
c49497726b [ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914)
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com>
2026-04-01 03:30:19 +00:00
Ben Browning
cb0b443274 [Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-04-01 03:00:31 +00:00
Luka Govedič
40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
2026-03-31 22:15:05 -04:00
Elvir Crnčević
0fab52f0aa Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-03-31 19:14:59 -07:00
Yifan Qiao
91e4521f9f [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-03-31 17:58:37 -07:00
Stig-Arne Grönroos
31a719bcd3 [ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887)
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
Signed-off-by: Stig-Arne Grönroos <sgronroo@amd.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-31 19:22:23 -04:00
Vedant V Jhaveri
2e56975657 Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-31 16:02:11 -07:00
Chang Su
36f1dc19ae feat(grpc): add periodic stats logging and servicer log forwarding (#38333)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-03-31 15:50:07 -07:00
Asaf Gardin
3dc01ef352 [Quantization] Consolidate dummy format logic into DummyModelLoader (#38637)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
2026-03-31 22:20:45 +00:00
Yanan Cao
cc671cb110 [Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
2026-03-31 17:06:42 -04:00
Wentao Ye
856589ed9a [Refactor] Remove dead code in kv connector and model runner (#38383)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-31 17:05:23 -04:00
czhu-cohere
517b769b58 [Perf] Fix DBO overlap: capture DeepEP event before yield (#38451)
Signed-off-by: root <conway.zhu@cohere.com>
2026-03-31 20:38:59 +00:00
yzong-rh
d9b90a07ac [MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-31 15:43:33 -04:00
Olya Kozlova
598190aac3 [fix] Remove trtllm ragged mla prefills (#36540)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2026-03-31 12:30:27 -07:00
Xu Jinyang
b779eb3363 [Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343)
Signed-off-by: AuYang <459461160@qq.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-03-31 23:03:24 +04:00
BadrBasowid
077a9a8e37 [torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-03-31 14:15:50 -04:00
Run Yu
07edd551cc [CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766)
Signed-off-by: Run Yu <yurun00@gmail.com>
2026-03-31 18:05:14 +00:00
mikaylagawarecki
7c080dd3c5 [4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
2026-03-31 10:21:13 -07:00
Yi Liu
0dd25a44ea [Quantization][Autoround][XPU] Add W4A16 Support (#37986)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
2026-03-31 16:48:24 +00:00
SandishKumarHN
3896e021a0 [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010)
Signed-off-by: SandishKumarHN <sandish@fb.com>
2026-03-31 12:22:26 -04:00
zhang-prog
b6e636c12c [Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
v0.18.2rc0
2026-03-31 15:50:41 +00:00
Jingu Kang
f1ff50c86c [Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501)
Signed-off-by: Jingu Kang <jg.k@navercorp.com>
2026-03-31 17:35:51 +02:00
Matthew Bonanni
757068dc65 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
2026-03-31 11:08:54 -04:00
Nicolò Lucchesi
7337ff7f03 [Docs] PD with Nixl compat matrix (#38628)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-31 15:01:21 +00:00
Kyle Sayers
5869f69c5f [Online Quant] [QeRL] Minor code cleanup (#38574)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-31 14:56:43 +00:00
wliao2
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
wenjun liu
e8057c00bc [CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
2026-03-31 22:23:18 +08:00
Nicolò Lucchesi
7430389669 [Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-31 09:42:37 -04:00
ElizaWszola
202f147cf2 Fix MLA runs when use_inductor_graph_partition=True (#38631)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2026-03-31 13:37:43 +00:00
Jiangyun Zhu
ea7bfde6e4 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-03-31 13:20:08 +00:00
sihao_li
d71a15041f [XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 12:49:43 +00:00
Ilya Markov
abdbb68386 [EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
2026-03-31 08:17:12 -04:00
liuzhenwei
0c63739135 [EPD] update EPD script arguments (#36742)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-03-31 12:02:09 +00:00
wang.yuqi
719735d6c5 [CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-31 10:54:54 +00:00
Maosheng Liao
aae3e688f8 Fix document of torchrun_example.py (#31113) 2026-03-31 10:54:23 +00:00
Matthew Bonanni
7d65463528 [WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-31 12:30:25 +02:00
Mateusz Sokół
8278825b57 DOC: TPU mention fix (#38129)
Signed-off-by: Mateusz Sokół <mat646@gmail.com>
2026-03-31 03:27:56 -07:00
Chang Su
acf7292bf2 [Misc] Move --grpc CLI argument into make_arg_parser (#38570)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-03-31 03:24:05 -07:00
Chauncey
ce884756f0 [Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-31 08:45:57 +00:00
wang.yuqi
d9d21eb8e3 [Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-31 07:52:00 +00:00
Yintong Lu
f09daea261 [CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
2026-03-31 15:27:37 +08:00
Kevin H. Luu
42318c840b [ci] Remove benchmarks job (#38611) 2026-03-31 06:46:21 +00:00