biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
DevByteAI	1f214290d6	fix(compile): apply partition wrapper when loading AOT cached functions (#31536 ) Signed-off-by: Devbyteai <abud6673@gmail.com> Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 17:27:26 +08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Isotr0py	eac3b96ec0	[Models] Allow converting Qwen3-VL into Reranker model (#31890 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-08 08:10:15 +00:00
Zhiwei	573a1d1119	[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2026-01-08 15:47:44 +08:00
prashanth058	d3235cb503	[Fix] Enable mm_processor_cache with vision LoRA (#31927 ) Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>	2026-01-08 15:31:51 +08:00
Chang Su	791b2fc30a	[grpc] Support gRPC server entrypoint (#30190 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Signed-off-by: njhill <nickhill123@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: njhill <nickhill123@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2026-01-07 23:24:46 -08:00
Andreas Karatzas	5f2a473ff3	[ROCm][CI] v1 cpu offloading attention backend fix (#31833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 14:37:50 +08:00
Andreas Karatzas	087a138963	[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 04:35:25 +00:00
Richard Zou	a79079feef	[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-08 04:04:58 +00:00
Robert Shaw	9f6dcb71ae	[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-01-08 03:46:27 +00:00
Andreas Karatzas	8dd2419fa9	[CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency (#31932 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 02:58:01 +00:00
Rabi Mishra	25eef3dc2e	feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-08 10:27:09 +08:00
Robert Shaw	5dcd7ef1f2	[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415 )	2026-01-07 19:42:33 -05:00
Nick Hill	10ef65eded	[BugFix] Fix bad words with speculative decoding (#31908 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-07 15:46:42 -05:00
Ilya Markov	6170d47d22	[EPLB] Optimize EPLB with numpy (#29499 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-01-07 15:21:35 -05:00
Xin Yang	0ada960a20	[Kernel] Support bias type in grouped_topk kernel (#31781 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-07 12:16:32 -08:00
Ning Xie	c907d22158	[refactor] refactor memory constants usage (#31865 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-07 18:37:31 +00:00
Kfir Toledo	b89443b8d9	[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761 ) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>	2026-01-07 16:59:43 +00:00
Kate Cheng	cc6dafaef2	[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213 ) Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2026-01-07 10:53:54 -05:00
Cyrus Leung	b665bbc2d4	[Chore] Migrate V0 attention utils (#31891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 13:44:36 +00:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
Kevin McKay	4614c5a539	[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function (#31106 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Signed-off-by: Kevin McKay <kevin@example.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-07 06:55:03 +00:00
Cyrus Leung	aafd4d2354	[Chore] Try remove `init_cached_hf_modules` (#31786 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 12:34:04 +08:00
vSeamar	6f351548b2	[Frontend] Implement robust video frame recovery for corrupted videos (#29197 ) Signed-off-by: cmartinez <cmartinez@roblox.com> Signed-off-by: vSeamar <cmartinez@roblox.com>	2026-01-07 01:13:24 +00:00
Angela Yi	9a1d20a89c	[CI] Add warmup run in test_fusion_attn (#31183 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-07 00:31:52 +00:00
Andreas Karatzas	2a42ae790d	[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-06 23:21:15 +00:00
Nikhil G	ada6f91d56	Fix RecursionError in MediaWithBytes unpickling (#31191 ) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>	2026-01-06 20:11:26 +00:00
Li, Jiang	8becf146bd	[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-06 19:10:18 +00:00
Charlie Fu	c07163663d	[ROCm][CI] Fix tests/compile unit tests (#28895 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-06 18:50:43 +00:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Robert Shaw	af8fd73051	[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-06 15:47:04 +00:00
Robert Shaw	d3e477c013	[MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-06 10:34:17 -05:00
wang.yuqi	96860af655	[Model] rename use_pad_token to use_sep_token (#31784 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-06 14:16:04 +00:00
Lucas Wilkinson	e0327c9db2	[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-06 04:05:17 -08:00
wang.yuqi	43d384bab4	[CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-06 19:30:05 +08:00
Isotr0py	ee2e69d6cd	[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 00:44:22 -08:00
Kevin McKay	1fb0209bbc	[Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check (#31177 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 14:10:59 +08:00
John Calderon	2f4e6548ef	[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874 ) Signed-off-by: John Calderon <jcalderon@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 00:23:00 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
Matthew Bonanni	276e03b92c	[CI][DeepSeek] Add nightly DeepSeek R1 `lm_eval` tests on H200 (#30356 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-05 17:17:59 -05:00
Nick Hill	32f4e4db00	[Cleanup] Remove deprecated fields from CachedRequestData class (#31734 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-05 21:07:14 +00:00
amitz-nv	ee21291825	[Model] Nemotron Parse 1.1 Support (#30864 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-05 13:00:14 -08:00
Isotr0py	51e38a8e30	[Misc] Enable Paligemma's PrefixLM attention mask computation (#31725 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 03:31:49 +08:00
Or Ozeri	d8e38d4939	Triton Attention: Support cross-layers blocks (#30687 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-05 19:29:16 +00:00
Isotr0py	6aa5b18e1d	[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 00:00:23 +08:00
wang.yuqi	911d38ed99	[Model] Let more models to support the score template. (#31335 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-05 11:54:26 +00:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Isotr0py	367856de14	[CI/Build] Revive skipped reward models e2e test (#31665 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-05 02:33:46 +00:00
Andreas Karatzas	f2b6dfd237	[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-05 02:17:05 +00:00

... 3 4 5 6 7 ...

4252 Commits