biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sage	802f306cd1	[Tests] Skip model weight download for render-only test server (#36813 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-12 06:24:42 +00:00
Yanan Cao	584a3f56de	[Kernel][Helion][13/N] Force static_shapes=False in helion register (#36677 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 05:35:29 +00:00
wang.yuqi	6ecabe4936	[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-12 12:22:05 +08:00
Flora Feng	8647c6cf51	[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 (#35895 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-12 10:25:14 +08:00
Nick Hill	262b76a09f	[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 01:20:34 +00:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Yanan Cao	cf632499ee	[Kernel] [Helion] [15/N] Split config files into per-platform files (#36698 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 17:25:29 -04:00
Or Ozeri	7ee5d5093b	[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 20:43:40 +00:00
Harry Mellor	65986db6ba	Make Gemma and Gemma 2 accept `inputs_embeds` like Gemma 3 (#36787 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 18:12:43 +00:00
Luka Govedič	9556af87d5	[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>	2026-03-11 10:56:55 -07:00
Or Ozeri	a1a3523a56	[KVConnector] Support worker -> scheduler metadata (#31964 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 17:36:37 +00:00
Julien Denize	a5d06dc557	Add 320 dimension size support to MLA (#36161 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2026-03-11 10:21:22 -07:00
Harry Mellor	5efa206a8c	Fix `ExaoneMoeMTP` test that never ran in Transformers v4 (#36792 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 17:10:23 +00:00
Cyrus Leung	196802dfa6	[Misc] Clean up renderers (#36770 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 16:39:29 +00:00
Isotr0py	c84b519cf3	[Bugfix] Fix negative max_tokens when input prompt is too long (#36789 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 16:30:51 +00:00
Richard Zou	822e250ab7	[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 16:07:09 +00:00
Jhao-Ting Chen	5573894737	Kimi k2.5 MLA based eagle3 (#36361 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>	2026-03-11 11:36:11 -04:00
Harry Mellor	d5816c8c2f	Fix tied weights in weight mapping test for Transformers v5 (#36788 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 15:10:26 +00:00
Wuxun Zhang	e584dce52b	Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 ) Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>	2026-03-11 19:19:15 +08:00
Weiguang Li	724759684c	[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136 ) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 03:13:06 -07:00
Richard Zou	09b6f99852	[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 03:12:03 -07:00
Angela Yi	13e79fc811	[ci] Update rtol for test_classification (#36556 ) Signed-off-by: angelayi <yiangela7@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>	2026-03-11 03:08:16 -07:00
roikoren755	e661b9ee83	[NemotronH] Small fix reasoning parser (#36635 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-11 02:44:41 -07:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
Sladyn	4aaaf8c8ce	feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503 ) Signed-off-by: sladynnunes <snunes@usc.edu>	2026-03-11 04:35:33 +00:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
Ning Xie	fe714dd507	[openapi server] log exception in exception handler(2/N) (#36201 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-10 20:16:30 -07:00
Nick Hill	65b2f405dc	[Core] Simplify core kv-cache blocks initialization logic (#36521 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 20:20:02 +00:00
Nick Hill	2a68464c5b	[Test] `test_async_scheduling.py` improvements (#36340 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 11:17:26 -07:00
Harry Mellor	f83b933b84	[CI] Bump `mypy` version to 1.19.1 (#36104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-10 09:18:28 -07:00
Hashem Hashemi	721ae79f50	Improvements to wvSplitKrc skinny GEMM solution (#34304 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-03-10 09:14:27 -07:00
Srinivasoo7	106ff69c4e	feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342 ) Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com> Signed-off-by: Sriusa4414@gmail.com Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 14:43:40 +00:00
Jiangyun Zhu	ca5fb4bbd8	[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-10 07:39:01 -07:00
wang.yuqi	a3189a08b0	[Model] Consolidate score logic by introduce score_type (#36479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-10 13:32:25 +00:00
Mark McLoughlin	234860399b	[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270 ) (#36628 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-10 06:20:41 -07:00
Harry Mellor	c88510083b	Fix Qwen2.5-VL test for Transformers v5 (#36532 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-10 12:05:34 +00:00
Chang Su	507ddbe992	feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-03-10 03:29:59 -07:00
Harry Mellor	195c997203	Fix LFM2 MoE test for Transformers v5 (#36534 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-09 22:29:17 -07:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
Hojin Yang	0836be3b03	[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-10 10:59:19 +08:00
Andreas Karatzas	179547d62c	[ROCm][CI] Fix ROCm GPT-OSS Eval test group (#36179 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 17:55:20 -07:00
Shaun Kotek	203a7f27da	add nemotron v3 reasoning parser (#36393 ) Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com> Co-authored-by: root <root@gpu-259.slurm-workers-slurm.slurm.svc.cluster.local>	2026-03-09 15:11:41 -07:00
Micah Williamson	4ff9b045fe	[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-09 13:27:55 -05:00
Copilot	4b87ffbefb	[torch.compile] Rename `compile_ranges_split_points` to `compile_ranges_endpoints` (#36027 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-09 18:04:40 +00:00
Andreas Karatzas	1e0f917b34	[ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm (#36101 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:07:44 -05:00
Andreas Karatzas	c174d54f86	[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:02:41 -05:00
Roberto L. Castro	580864d81e	[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>	2026-03-09 09:50:36 -07:00
Roberto L. Castro	2b28b9b269	[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-09 09:46:57 -07:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00

1 2 3 4 5 ...

4859 Commits