biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lukas Geiger	4f6eed3bd4	[Core] Simplify multimodal masking (#34246 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2026-04-01 01:18:22 -07:00
Li, Jiang	36d7f19897	[CPU] Support head_size 512 in cpu_attn (#38676 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-04-01 05:42:27 +00:00
Jeffrey Wang	2d725b89c5	[Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-04-01 05:31:20 +00:00
Augusto Yao	ef53395e2c	[bugfix] do not add extra linebreak for score/rerank with chat template (#38617 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-04-01 04:50:07 +00:00
Lucas Wilkinson	eb47454987	[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-01 00:15:53 -04:00
Matthew Bonanni	116f4be405	[1/N][Cleanup] Standardize on use of `is_quantized_kv_cache` (#38659 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-04-01 04:08:01 +00:00
Wentao Ye	7b01d97a22	[Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-01 03:54:58 +00:00
HarshRathva	17b72fd1c8	Fix priority preemption regression test in scheduler (#37051 ) Signed-off-by: HarshRathva <harshrathvaai@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-01 06:36:12 +03:00
Samu Tamminen	c49497726b	[ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914 ) Signed-off-by: Samu Tamminen <stammine@amd.com> Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com>	2026-04-01 03:30:19 +00:00
Ben Browning	cb0b443274	[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-04-01 03:00:31 +00:00
Luka Govedič	40bb175027	[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Signed-off-by: chzhang <chaojun.zhang@intel.com> Signed-off-by: Luka Govedic <luka.govedic@gmail.com> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>	2026-03-31 22:15:05 -04:00
Elvir Crnčević	0fab52f0aa	Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-31 19:14:59 -07:00
Yifan Qiao	91e4521f9f	[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-03-31 17:58:37 -07:00
Stig-Arne Grönroos	31a719bcd3	[ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887 ) Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Signed-off-by: Stig-Arne Grönroos <sgronroo@amd.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-31 19:22:23 -04:00
Vedant V Jhaveri	2e56975657	Generative Scoring (#34539 ) Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-31 16:02:11 -07:00
Chang Su	36f1dc19ae	feat(grpc): add periodic stats logging and servicer log forwarding (#38333 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-31 15:50:07 -07:00
Asaf Gardin	3dc01ef352	[Quantization] Consolidate dummy format logic into DummyModelLoader (#38637 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-03-31 22:20:45 +00:00
Yanan Cao	cc671cb110	[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>	2026-03-31 17:06:42 -04:00
Wentao Ye	856589ed9a	[Refactor] Remove dead code in kv connector and model runner (#38383 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-31 17:05:23 -04:00
czhu-cohere	517b769b58	[Perf] Fix DBO overlap: capture DeepEP event before yield (#38451 ) Signed-off-by: root <conway.zhu@cohere.com>	2026-03-31 20:38:59 +00:00
yzong-rh	d9b90a07ac	[MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286 ) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: yzong-rh <yzong@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-31 15:43:33 -04:00
Olya Kozlova	598190aac3	[fix] Remove trtllm ragged mla prefills (#36540 ) Signed-off-by: Olya Kozlova <okozlova@nvidia.com>	2026-03-31 12:30:27 -07:00
Xu Jinyang	b779eb3363	[Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343 ) Signed-off-by: AuYang <459461160@qq.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2026-03-31 23:03:24 +04:00
BadrBasowid	077a9a8e37	[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-31 14:15:50 -04:00
Run Yu	07edd551cc	[CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766 ) Signed-off-by: Run Yu <yurun00@gmail.com>	2026-03-31 18:05:14 +00:00
mikaylagawarecki	7c080dd3c5	[4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-31 10:21:13 -07:00
Yi Liu	0dd25a44ea	[Quantization][Autoround][XPU] Add `W4A16` Support (#37986 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-03-31 16:48:24 +00:00
SandishKumarHN	3896e021a0	[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010 ) Signed-off-by: SandishKumarHN <sandish@fb.com>	2026-03-31 12:22:26 -04:00
zhang-prog	b6e636c12c	[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com> v0.18.2rc0	2026-03-31 15:50:41 +00:00
Jingu Kang	f1ff50c86c	[Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501 ) Signed-off-by: Jingu Kang <jg.k@navercorp.com>	2026-03-31 17:35:51 +02:00
Matthew Bonanni	757068dc65	[Bugfix][Async] Fix async spec decoding with hybrid models (#38556 ) Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>	2026-03-31 11:08:54 -04:00
Nicolò Lucchesi	7337ff7f03	[Docs] PD with Nixl compat matrix (#38628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 15:01:21 +00:00
Kyle Sayers	5869f69c5f	[Online Quant] [QeRL] Minor code cleanup (#38574 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-31 14:56:43 +00:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
wenjun liu	e8057c00bc	[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-31 22:23:18 +08:00
Nicolò Lucchesi	7430389669	[Bugfix][CI] Skip flaky `test_eagle` test (#38566 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 09:42:37 -04:00
ElizaWszola	202f147cf2	Fix MLA runs when use_inductor_graph_partition=True (#38631 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2026-03-31 13:37:43 +00:00
Jiangyun Zhu	ea7bfde6e4	[CI] fix LM Eval Qwen3.5 Models (B200) (#38632 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-31 13:20:08 +00:00
sihao_li	d71a15041f	[XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596 ) Signed-off-by: sihao.li <sihao.li@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 12:49:43 +00:00
Ilya Markov	abdbb68386	[EPLB] Add alternative communication for EPLB weight exchange (#33176 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>	2026-03-31 08:17:12 -04:00
liuzhenwei	0c63739135	[EPD] update EPD script arguments (#36742 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-31 12:02:09 +00:00
wang.yuqi	719735d6c5	[CI Failure] pin colmodernvbert revision (#38612 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-31 10:54:54 +00:00
Maosheng Liao	aae3e688f8	Fix document of torchrun_example.py (#31113 )	2026-03-31 10:54:23 +00:00
Matthew Bonanni	7d65463528	[WIP][CI][Bugfix] Fix `test_run_eagle_dp` (#38584 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-31 12:30:25 +02:00
Mateusz Sokół	8278825b57	DOC: TPU mention fix (#38129 ) Signed-off-by: Mateusz Sokół <mat646@gmail.com>	2026-03-31 03:27:56 -07:00
Chang Su	acf7292bf2	[Misc] Move --grpc CLI argument into make_arg_parser (#38570 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-31 03:24:05 -07:00
Chauncey	ce884756f0	[Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-31 08:45:57 +00:00
wang.yuqi	d9d21eb8e3	[Frontend][3/n] Improve pooling entrypoints \| scoring. (#28631 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-31 07:52:00 +00:00
Yintong Lu	f09daea261	[CPU] Support int8 compute mode in CPU AWQ (#35697 ) Signed-off-by: Yintong Lu <yintong.lu@intel.com>	2026-03-31 15:27:37 +08:00
Kevin H. Luu	42318c840b	[ci] Remove benchmarks job (#38611 )	2026-03-31 06:46:21 +00:00

1 2 3 4 5 ...

15433 Commits