biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Or Ozeri	2a4dbe24ea	[BugFix] Wait for compute before offloading KV to CPU (#31341 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 22:25:08 +00:00
RickyChen / 陳昭儒	8020a60402	[Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classification (#32089 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-10 12:40:09 -08:00
Vadim Gimpelson	e15a5ff07b	[MISC] Add strict contiguity check for FlashInfer attention tensors (#32008 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>	2026-01-10 12:40:05 -08:00
Vensen	6ea001cfb7	[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int8 (#31637 ) Signed-off-by: vensen <vensenmu@gmail.com>	2026-01-10 12:40:02 -08:00
shyeh25	1c46dea001	Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308… (#31617 ) Signed-off-by: shyeh25 <206795756+shyeh25@users.noreply.github.com>	2026-01-10 12:39:59 -08:00
Or Ozeri	028599739d	[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 12:39:25 -08:00
gnovack	d1fd802fa3	fused_moe_kernel - cast accumulator after applying router weights (#32002 ) Signed-off-by: gnovack <gnovack@amazon.com>	2026-01-11 04:36:45 +08:00
Xin Yang	543c23be78	[LoRA][Perf] Improve FusedMoE LoRA performance for small rank (#32019 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-10 11:04:18 -08:00
jvlunteren	b8bf5c45bb	[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2026-01-10 18:13:44 +00:00
Michael Goin	e6c6f2c79d	[Quant] Support MXFP4 W4A16 for compressed-tensors dense models (#31926 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-01-10 06:44:35 -08:00
Jeremy Teboul	07286ec5a6	[Bugfix] Fix integer overflow in Gemma3n audio processing (#31657 ) Signed-off-by: Jeremy Teboul <jeremyte@meta.com>	2026-01-10 17:52:53 +08:00
Ning Xie	14fc7a68c7	[Bugfix] fix offline chat output prompt (#32076 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-10 07:50:57 +00:00
Cyrus Leung	5f2385a4c8	[Benchmark][1/2] Generalize SLA criterion validation from binary flags to margins (#32075 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-10 07:11:03 +00:00
Frelam	a01a1c0d69	[Bugfix] fix encoder cache leak of waiting requests in scheduler to solve stuck in CPU scheduling (#31857 ) Signed-off-by: frelam <frelam112233@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-10 06:27:58 +00:00
Lucas Wilkinson	da6709c9fe	[Misc] Delay deprecation of CommonAttentionMetadata properties (#32074 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-09 21:06:44 -08:00
Andreas Karatzas	d83becd503	[ROCm][CI] Fix flaky `test_function_calling_with_stream` and reduce schema test examples (#32063 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-10 05:02:35 +00:00
roikoren755	0c9614876e	Update modelopt KV cache quantization resolution to new scheme (#31895 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-01-10 04:54:13 +00:00
Cyrus Leung	583a90e005	[Refactor] Separate sequence and token pooling types (#32026 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-10 04:53:24 +00:00
maang	52d428295d	[Core] Refactor ColumnParallelLinear: remove unused parameter and optimize forward (#31939 ) Signed-off-by: maang <maang_h@163.com>	2026-01-10 04:19:49 +00:00
Kevin McKay	c60578de0a	[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process (#31295 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2026-01-10 03:57:38 +00:00
PatrykSaffer	80fead8bf6	Fuse RoPE and MLA KV-cache write (#25774 ) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-09 19:18:37 -08:00
Akshat Shrivastava	e45946bd91	feature/issac 0.2 (#31550 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-10 03:18:05 +00:00
Lucas Kabela	ea6d067a2a	[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-09 22:01:38 -05:00
Ning Xie	abd9224280	resolve pydantic error in startup benchmark (#31348 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-10 02:41:27 +00:00
Kevin McKay	4dc0d606b7	[Bugfix] Narrow broad exceptions in compilation backends (#31616 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-09 21:39:22 -05:00
Micah Williamson	ac0675ff6b	[CI] Allow Deprecated Quantization For LM Eval Tests (#32065 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-09 19:10:47 -07:00
Wentao Ye	e18464a57d	[Perf] Optimize async scheduling placeholder using empty (#32056 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-10 00:46:11 +00:00
Russell Bryant	1963245ed1	[Core] Use weights_only=True with torch.load (#32045 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-01-10 00:28:57 +00:00
Matthew Bonanni	0308901975	[2/N][Attention] Fix pre-commit errors (#32052 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-10 00:27:15 +00:00
Lucas Kabela	aaf4b70aae	[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744 )	2026-01-09 18:30:38 -05:00
Nick Hill	3adffd5b90	[Misc] Enable async scheduling by default with spec decoding (#31998 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 23:09:34 +00:00
zhrrr	97ba96fbe9	[perf][async] support non cpu sync get logprob tensors for spec (#31336 ) Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-01-09 21:24:51 +00:00
Chendi.Xue	94578127a4	[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout (#30275 )	2026-01-09 21:22:19 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Andrew Xia	1f8b7c536b	[responsesAPI] fix incomplete_messages for simple/parsable context (#31836 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-01-09 21:00:57 +00:00
Lucas Wilkinson	0a0aa07747	[Quant] Make static quant support all group shapes (#30833 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-09 12:49:27 -08:00
jiahanc	f9e2a75a1e	[fix] add cutedsl to global sf (#32001 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2026-01-09 12:03:02 -08:00
Runkai Tao	a4d5d663e2	Add unpermute-aware fused MoE path and small-batch fallback (#29354 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-09 12:58:39 -07:00
Jeremy Teboul	657e9c0e18	[Fix] Introduce audio channels spec (#31595 ) Signed-off-by: Jeremy Teboul <jeremyte@meta.com>	2026-01-09 19:34:51 +00:00
Wentao Ye	308feab33f	[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-09 11:13:43 -08:00
Wentao Ye	28ae32a5d3	[Refactor] Remove numpy split in async scheduling (#32034 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-09 19:09:02 +00:00
Andrew Xia	f32c629eb4	[Frontend][gpt-oss] Allow system message to overwrite model identity (#31737 ) Signed-off-by: lacora <hyelacora@gmail.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: lacora <hyelacora@gmail.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-09 14:03:57 -05:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
Michael Goin	d5ec6c056f	[UX] Add vLLM model inspection view (#29450 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-09 10:12:35 -07:00
Shanshan Shen	08d954f036	[Doc] Add developer guide for CustomOp (#30886 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-09 16:21:11 +00:00
Kevin Šuc	ac9f9330e6	Rename --exclude-log-deltas to --enable-log-deltas (#32020 ) Signed-off-by: Catacomba <kevinsuc16@gmail.com>	2026-01-09 15:30:40 +00:00
Isotr0py	2d0c5b630e	[Doc] Remove hardcoded Whisper in example openai translation client (#32027 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-09 14:44:52 +00:00
Michael Goin	34cd32fe30	[Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe (#31832 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-01-09 07:40:33 -07:00
R3hankhan	8e27663b6a	[CPU] Add head sizes 80 and 112 with vec16 fallback (#31968 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-01-09 22:14:46 +08:00
maang	7cdf7e2fe0	[Model] Remove redundant None check in DeepSeekOCR image input processing (#32016 ) Signed-off-by: maang <maang_h@163.com>	2026-01-09 06:12:44 -08:00

1 2 3 4 5 ...

12880 Commits