biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
maang	1f33e38e81	[Model] Cleanup: Remove redundant manual definition of `make_empty_intermediate_tensors` in GLM-4-MoE (#31869 ) Signed-off-by: maang <maang_h@163.com>	2026-01-07 08:18:28 +00:00
sihao_li	59fe6f298e	[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762 ) Signed-off-by: sihao.li <sihao.li@intel.com>	2026-01-07 08:10:29 +00:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
xuebwang-amd	0dd5dee9b9	[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe (#31676 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2026-01-07 07:36:13 +00:00
Kevin McKay	4614c5a539	[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function (#31106 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Signed-off-by: Kevin McKay <kevin@example.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-07 06:55:03 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	482914849c	[BugFix] LoRA: Support loading base_layer of experts (#31104 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2026-01-07 14:49:39 +08:00
tianshu-Michael-yu	efeaac92f2	[Bugfix] Fix race condition in async-scheduling for vlm model (#31841 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>	2026-01-07 06:45:10 +00:00
tjp_zju	55caa6051d	refactor: find_loaded_library (#31866 ) Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-07 06:42:20 +00:00
Lucas Wilkinson	c7a79d41a0	[Attention][3/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31850 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-07 13:31:34 +08:00
vllmellm	6409004b26	[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend (#31816 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-07 05:04:53 +00:00
Cyrus Leung	aafd4d2354	[Chore] Try remove `init_cached_hf_modules` (#31786 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 12:34:04 +08:00
Jack Yang	0a2c2dc3f1	fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround (#31465 ) Signed-off-by: Zhuohao Yang <zy242@cornell.edu> Co-authored-by: Zhuohao Yang <zy242@cornell.edu> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-07 04:08:47 +00:00
Tyler Michael Smith	f09c5feb7c	Change warning in get_current_vllm_config to report caller's line number (#31855 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-01-07 03:48:13 +00:00
Cyrus Leung	1b8af957f6	[Doc] Update release docs (#31799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 03:27:40 +00:00
Ce Zhao	a051525e07	[Model] Enable LoRA support for PaliGemma (#31656 ) Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com> Signed-off-by: Alcor <alcor_zhao@outlook.com> Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>	2026-01-07 10:09:32 +08:00
Yihua Cheng	5b833be49e	[1/2][lmcache connector] clean up lmcache multi-process adapter (#31838 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2026-01-07 02:02:42 +00:00
Lucas Kabela	873480d133	[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-06 20:37:51 -05:00
vSeamar	6f351548b2	[Frontend] Implement robust video frame recovery for corrupted videos (#29197 ) Signed-off-by: cmartinez <cmartinez@roblox.com> Signed-off-by: vSeamar <cmartinez@roblox.com>	2026-01-07 01:13:24 +00:00
Andreas Karatzas	364a8bc6dc	[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing `VLLM_FLOAT32_MATMUL_PRECISION` from all ROCm tests (#31829 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-07 01:12:23 +00:00
Angela Yi	9a1d20a89c	[CI] Add warmup run in test_fusion_attn (#31183 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-07 00:31:52 +00:00
Cyrus Leung	309a8f66ee	[Bugfix] Handle mistral tokenizer in get_hf_processor (#31817 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 07:46:56 +08:00
Andreas Karatzas	e5d427e93a	[ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal Tests (Nemotron) (#31835 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-06 23:23:11 +00:00
Andreas Karatzas	2a42ae790d	[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-06 23:21:15 +00:00
Matthew Bonanni	d49899732e	[Spec Decode][UX] Add acceptance stats to `vllm bench serve` report (#31739 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 21:21:42 +00:00
Elvir Crnčević	dba95378a6	Report error log after vllm bench serve (#31808 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>	2026-01-06 20:24:19 +00:00
Nikhil G	ada6f91d56	Fix RecursionError in MediaWithBytes unpickling (#31191 ) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>	2026-01-06 20:11:26 +00:00
Li, Jiang	8becf146bd	[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-06 19:10:18 +00:00
Charlie Fu	c07163663d	[ROCm][CI] Fix tests/compile unit tests (#28895 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-06 18:50:43 +00:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Yakine Tahtah	4e67a8f616	[Bugfix] Fix GLM-4 MoE router logits dtype for data parallel chunking (#31055 ) Signed-off-by: ReinforcedKnowledge <reinforced.knowledge@gmail.com>	2026-01-06 17:57:56 +00:00
Masataro Asai	142c4d1738	make 500: InternalServerError more informative (#20610 ) Signed-off-by: Masataro Asai <guicho2.71828@gmail.com>	2026-01-06 17:36:24 +00:00
Ning Xie	6f5e653383	[Log] add log about gpu worker init snapshot and requested memory (#29493 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-06 17:32:55 +00:00
Vadim Gimpelson	22dffca982	[PERF] Speed-up of GDN attention decode part (Qwen3-Next) (#31722 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-06 17:32:46 +00:00
Lucas Wilkinson	4c73be14e0	[Attention][2/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31774 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-06 17:32:14 +00:00
Jinzhen Lin	2f4bdee61e	[Quantization][MoE] remove unused ep logic from moe marlin (#31571 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-06 09:07:19 -08:00
roikoren755	28c94770ad	[NemotronH] Use ReplicatedLinear for fc1_latent_proj (#31807 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-01-06 16:00:40 +00:00
Robert Shaw	af8fd73051	[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-06 15:47:04 +00:00
Robert Shaw	d3e477c013	[MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-06 10:34:17 -05:00
Isotr0py	02809af1e7	[Bugfix]: Fix cross attention backend selection for Turing GPU (#31806 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 23:15:56 +08:00
Jee Jee Li	cbd4690a03	[LoRA]Disable linear LoRA kernel PDL (#31777 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-06 23:12:25 +08:00
wang.yuqi	96860af655	[Model] rename use_pad_token to use_sep_token (#31784 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-06 14:16:04 +00:00
Chauncey	0202971a48	[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false (#31788 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-06 13:53:21 +00:00
Jzz1943	2c1a4f2488	[Bugfix]: avoid overriding audio/text kwargs (Qwen3-Omni) (#31790 ) Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com>	2026-01-06 12:59:17 +00:00
Cyrus Leung	6444824873	[Misc] Implement `TokenizerLike.convert_tokens_to_ids` (#31796 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-06 12:08:22 +00:00
kzwrime	bf0f3a4638	[Bugfix] Fix torch.compile error for DP + MoE on CPU Backend (#31650 ) Signed-off-by: kunzh <zhikun.wu@outlook.com>	2026-01-06 12:06:20 +00:00
Lucas Wilkinson	e0327c9db2	[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-06 04:05:17 -08:00
Cyrus Leung	14df02b4e1	[Chore] Cleanup `mem_utils.py` (#31793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-06 19:55:59 +08:00
BlankR	6ebb66ccea	[Doc] Fix format of multimodal_inputs.md (#31800 ) Signed-off-by: BlankR <hjyblanche@gmail.com>	2026-01-06 03:30:24 -08:00
wang.yuqi	43d384bab4	[CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-06 19:30:05 +08:00
Cyrus Leung	db318326a5	[Misc] Use `deprecated` for `seed_everything` (#31780 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-06 11:29:55 +00:00

1 2 3 4 5 ...

12732 Commits