biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Joshua Deng	91601ff478	[Feature] add session based streaming input support to v1 (#28973 ) Signed-off-by: Joshua Deng <joshuakdeng@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-24 12:06:28 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
Micah Williamson	019e2c3b7c	[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-22 05:47:33 +00:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Andreas Karatzas	e02706d2d2	[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA parity in `test_async_scheduling` (#32000 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 20:48:32 +08:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
Micah Williamson	fd8afdf38d	[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-18 10:27:37 +08:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00
Andreas Karatzas	783644e4ac	[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-12 03:54:56 +00:00
Harry Mellor	8781cd6b88	Add Eagle and Eagle3 support to Transformers modeling backend (#30340 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 17:02:10 +00:00
Wentao Ye	d6464f2679	[Chore] Fix torch precision warning (#30428 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-11 04:05:56 +00:00
Andreas Karatzas	ed7af3178a	[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 05:33:13 +00:00
Micah Williamson	7d80c73d42	[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 02:35:49 +00:00
Lucas Wilkinson	abe93bce59	[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 17:18:10 -08:00
Wentao Ye	83319b44c2	[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-09 10:40:37 -05:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Divakar Verma	afb1e5b380	[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 20:46:10 +00:00
Divakar Verma	e2fbfc955e	[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 05:27:46 +00:00
Divakar Verma	a690fb5bd6	[CI][ROCm] Fix test_correctness_sliding_window (#29243 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 04:53:27 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
EanWang211123	37b15e97e8	[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-27 22:05:45 -08:00
Nick Hill	4e57c6587f	[Core] Support logprobs with spec decode + async scheduling (#29223 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-25 12:55:24 -08:00
WeiQing Chen	b34129bf8e	[Misc] remove useless v1 env (#29164 ) Signed-off-by: David Chen <530634352@qq.com>	2025-11-21 01:41:20 -08:00
Nick Hill	5bdd155277	[CI] Fix async scheduling + spec decoding test flake (#28902 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 05:26:32 +00:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
Nick Hill	58e61e56b7	[Test] Rework e2e async scheduling tests (#28744 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 16:01:09 -08:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Yong Hoon Shin	9324e10275	Fix KV sharing fast prefill with cudagraph enabled (#28537 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 11:53:42 +00:00
Yannick Schnider	119c4927b3	[Bugfix] Fix validate model input for decoder models (#27099 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-13 10:18:47 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Nick Hill	938a81692e	[AsyncScheduling] Don't schedule past request max_tokens (#27922 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 17:06:28 +00:00
Aurick Qiao	2c19d96777	[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-11-03 09:23:31 -08:00
Rémi Delacourt	cec7c28833	[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-03 02:22:46 -05:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Dipika Sikka	413ef7a3b4	[Speculators] Move tests + fix integration (#27308 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: rahul-tuli <rtuli@redhat.com> Co-authored-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-10-29 00:54:21 -07:00
Nick Hill	4fe5895361	[AsyncScheduling] Make async overlap work with logprobs (#27615 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-28 22:35:54 +00:00
Huy Do	becb7de40b	Update PyTorch to 2.9.0+cu129 (#24994 ) Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-21 17:20:18 -04:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Maximilien de Bayser	d8bebb008a	Add tests for chunked prefill and prefix cache with causal pooling models (#26526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>	2025-10-14 07:45:04 +08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Nick Hill	5bc26c438d	[BugFix] Make penalties and bad_words work with async scheduling (#26467 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 23:27:04 +00:00
Nick Hill	949cb0170d	[BugFix] Fix async scheduling + request preemption (#26385 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 20:29:57 +00:00

1 2

97 Commits