biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
Harry Mellor	a39dd7bb06	[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers (#28559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:38:13 +00:00
alberto	bac904565f	Implement ARC KV cache eviction policy for CPU offloader (#27039 ) Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: alberto <aperdomo@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com>	2025-11-12 09:51:39 -08:00
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00
TJian	edb59a9470	[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#28500 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-12 05:01:14 -08:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
wangxiyuan	e1710393c4	[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 18:22:16 -07:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
Adrian Abeyta	d23539549a	Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-12 00:34:58 +00:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Zhewen Li	e553424919	[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA (#28424 ) Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-12 01:09:47 +08:00
xuebwang-amd	5a1271d83a	[Quantization] fix attention quantization of gpt_oss model (#27334 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2025-11-11 12:06:00 -05:00
xuebwang-amd	05576df85c	[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-11 12:05:22 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Nicolò Lucchesi	a7ef3eb0cd	[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282 )	2025-11-11 16:57:43 +00:00
jvlunteren	533b018f72	[BugFix] Fix Failing Ruff Check (#28469 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-11-11 06:41:43 -08:00
bnellnm	a1448b4b69	[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064 )	2025-11-11 07:29:02 -07:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Robert Shaw	e605e8e323	[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-11 05:59:08 +00:00
Zuyi Zhao	bca74e32b7	[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com> Signed-off-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-11 04:57:01 +00:00
Zhuohan Li	8d706cca90	[Misc] FlattenLogprobs -> FlatLogprobs (#28335 )	2025-11-11 03:41:23 +00:00
Lucas Wilkinson	39029d5192	[CI/Test Fix] Fix CP tests on Blackwell (#28404 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 01:36:29 +00:00
Matthew Bonanni	0bf29fadf5	[Test] Remove old non-varlen FA2 test (#28420 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-10 23:57:41 +00:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Rémi Delacourt	6d54336ae5	[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-10 14:53:32 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Shinichi Hemmi	a98cc35c34	Restore PlaMo2 unit test as `pfnet/plamo-2-1b` now supports `transformers >=4.56` (#28019 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-10 06:50:02 +00:00
Lucas Wilkinson	e8697faf03	[V0 deprecation] Remove no longer used `get_metadata_cls` (#28370 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 14:32:09 +08:00
Varun Sundar Rabindranath	6b2b9fd934	[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 10:45:29 +08:00
Zhewen Li	a65a934ebe	[CI/Build] Temporary fix to LM Eval Small Models (#28324 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-09 21:08:38 +00:00
usberkeley	4a8d6bd168	Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-09 19:11:46 +00:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
ElizaWszola	171133f929	[Bugfix] Fix test fused quant layernorm tests (#27865 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-08 14:31:33 -08:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
Harry Mellor	d9ab1ad9d1	`reasoning_content` -> `reasoning` (#27752 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-08 12:15:08 +00:00
Isotr0py	934a9c3b79	[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 05:01:27 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
Harry Mellor	811df41ee9	Update Flashinfer from `v0.4.1` to `v0.5.2` (#27952 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 16:24:42 -08:00

... 14 15 16 17 18 ...

4252 Commits