biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jonas M. Kübler	98e7f223b9	enable skipping of SW attention layers when using FP8 KV cache (#33695 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2026-03-27 07:25:02 -06:00
Juan Pérez de Algaba	b111f8a61f	fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952 ) Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2026-03-27 09:02:10 -04:00
Sage Moore	497e234d38	[EPLB] Cleanup the transfer logic for the various eplb maps (#34520 ) Signed-off-by: Sage Moore <sagmoore@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-27 10:18:46 +01:00
dtc	6287e7fa20	[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-03-27 09:26:40 +01:00
Flora Feng	aee4c14689	[Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-27 14:42:26 +08:00
Li, Jiang	becaed6ec8	[CPU] Support CT W4A16 on CPU MP kernel (#38219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-27 14:15:28 +08:00
Or Ozeri	7cc302dd87	[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-27 08:38:33 +03:00
Bvicii	999dfc1622	[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789 ) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-26 22:17:00 -07:00
Giancarlo Delfin	c32e97602d	[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-26 13:38:12 -07:00
Andreas Karatzas	9c3ae04bfe	[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 16:51:18 +00:00
Divakar Verma	b9dbc5c4ab	[Mamba][APC] Add test case to compare apc outputs (#34977 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-03-26 16:40:35 +00:00
Andreas Karatzas	bdc1719eb9	[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 09:26:46 -07:00
Zhewen Li	be1a85b7a2	Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050 ) (#38169 ) Co-authored-by: Zhewen Li <zhewenli@inferact.ai>	2026-03-26 07:59:09 -07:00
Cyrus Leung	2e225f7bd2	[Renderer] Consolidate factory methods (#38218 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 12:19:22 +00:00
wang.yuqi	dcdc145893	[CI] Reorganize scoring tests (#38207 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-26 12:07:01 +00:00
Andreas Karatzas	f2d16207c7	[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 19:57:00 +08:00
Andreas Karatzas	37a83007fe	[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 19:54:59 +08:00
Wentao Ye	bf5eec638d	[Refactor] Remove unused utils (#38153 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-26 17:08:19 +08:00
Vadim Gimpelson	52069012fe	[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-26 01:21:47 -07:00
Matej Rojec	2908094567	Add `/v1/chat/completions/batch` endpoint for batched chat completions (#38011 ) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>	2026-03-26 12:13:33 +08:00
Woosuk Kwon	144030c84e	Relocate Encoder CUDA graph manager (#38116 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 20:52:12 -07:00
Harry Mellor	3c3c084240	Various Transformers v5 fixes (#38127 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 00:10:08 +00:00
Ekagra Ranjan	7b54f60db0	[Cohere] Enable Cohere-Transcribe (#38120 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-25 16:13:51 -07:00
Guillaume Guy	70a2152830	[MultiModal] add support for numpy array embeddings (#38119 ) Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com> Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com> Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-25 20:13:04 +00:00
Andreas Karatzas	7d6917bef5	[ROCm] Fix MoE kernel test failures on gfx950 (#37833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-03-25 13:46:40 -05:00
Nick Hill	72cad44d3c	[Frontend] Move APIServerProcessManager target server fn (#38115 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 18:14:41 +00:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Yongye Zhu	678b3c99e8	[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050 )	2026-03-25 10:16:40 -07:00
Richard Zou	6e37c46b35	[compile] Add some more startup tests for top models (#38046 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-25 12:02:22 -04:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Andrii Skliar	cd7643015e	[Feature] Support per-draft-model MoE backend via `--speculative-config` (#37880 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: [Andrii Skliar] <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-25 14:31:52 +00:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Matthias Gehre	a889b7f584	[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-25 11:42:58 +00:00
Harry Mellor	ba2910f73a	Fix offline mode test for Transformers v5 (#38095 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 11:39:48 +00:00
Andreas Karatzas	f262a62aa1	[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 10:55:51 +00:00
Andreas Karatzas	9ac2fcafbb	[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 11:24:33 +01:00
Andreas Karatzas	04cec4f927	[ROCm][CI] Increase OpenAPI schema test timeouts (#38088 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 18:06:58 +08:00
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00
vllmellm	42e9547976	[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-25 05:06:15 +00:00
Andreas Karatzas	679c6a3ecc	[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 08:17:33 +08:00
Terry Gao	82580b10ac	[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485 ) Signed-off-by: tianrengao <terrygao87@gmail.com> Co-authored-by: Tianren Gao <tianren@fb.com>	2026-03-24 19:37:51 -04:00
liangel-02	8c47fdfdb1	[FlexAttention] allow custom mask mod (#37692 ) Signed-off-by: Angel Li <liangel@meta.com>	2026-03-24 16:03:24 -04:00
Richard Zou	89f572dbc0	[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-24 19:08:26 +00:00
Nick Cao	935c46dd9b	[Model] Add Granite 4.0 1B speech to supported models (#38019 ) Signed-off-by: Nick Cao <ncao@redhat.com>	2026-03-24 18:23:41 +00:00
Dhruv Singal	4df5fa7439	[Bugfix] Force continuous usage stats when CLI override is enabled (#37923 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: OpenCode <noreply@openai.com>	2026-03-24 10:29:50 -07:00
Sungjae Lee	4731884796	[Feature] limit thinking tokens (hard limit) (#20859 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 09:53:07 -07:00
wang.yuqi	1b6cb920e6	[Deprecate] Deprecate pooling multi task support. (#37956 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-24 14:07:47 +00:00
Ronen Schaffer	e3c6c10cad	[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into `cpu/` package (#37874 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-24 07:02:51 +02:00
Wentao Ye	c59a132f96	[V0 Deprecation] Refactor kv cache from list to element (#37487 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 20:10:11 -07:00

1 2 3 4 5 ...

4971 Commits