biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
qizixi	bb3605db85	[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-04-20 20:54:29 -07:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Chengji Yao	b1eb4ca152	[TPU] Update PyTorch/XLA (#16288 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 14:46:32 +08:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
Michael Goin	8e5314a468	[V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (#15837 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-07 23:24:07 -07:00
Roger Wang	f2ebb6f541	[V1] Scatter and gather placeholders in the model runner (#16076 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-04-08 10:43:41 +08:00
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Roger Wang	af51d80fa1	Revert "[V1] Scatter and gather placeholders in the model runner" (#16075 )	2025-04-04 14:50:57 -07:00
Cyrus Leung	f5722a5052	[V1] Scatter and gather placeholders in the model runner (#15712 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-04-04 21:26:44 +00:00
Mark McLoughlin	a35a8a8392	[V1][Spec Decode] Avoid logging useless nan metrics (#16023 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-04 08:52:41 -07:00
iefgnoix	b6be6f8d1e	[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-04-03 14:23:28 -07:00
Hyesoo Yang	1b84eff03a	[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736 ) Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>	2025-04-02 17:18:08 -07:00
Russell Bryant	14e53ed11f	[V1] Fix json_object support with xgrammar (#15488 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-02 02:00:08 -07:00
Mark McLoughlin	a79cc68b3a	[V1][Metrics] Initial speculative decoding metrics (#15151 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-01 10:45:04 -07:00
Roger Wang	7e3f7a4ee7	[CI] Disable flaky structure decoding test temporarily. (#15892 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-01 17:42:34 +00:00
Varun Sundar Rabindranath	79455cf421	[Misc] Enable V1 LoRA by default (#15320 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-04-01 16:53:56 +08:00
Chen Zhang	3a5f0afcd2	[V1] Implement sliding window attention in kv_cache_manager (#14097 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-01 00:33:17 -07:00
Mark McLoughlin	f98a4920f9	[V1][Core] Remove unused speculative config from scheduler (#15818 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-31 19:15:21 +00:00
Alexander Matveev	9a2160fa55	[V1] TPU CI - Add basic perf regression test (#15414 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-31 13:25:20 -04:00
shangmingc	239b7befdd	[V1][Spec Decode] Remove deprecated spec decode config params (#15466 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-31 09:19:35 -07:00
Julien Denize	6909a76201	[Bugfix] Fix Mistral guided generation using xgrammar (#15704 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-03-29 20:20:19 -07:00
Chauncey	045533716b	[CI] xgrammar structured output supports Enum. (#15757 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-03-29 20:20:02 -07:00
Russell Bryant	7a7992085b	[CI] Speed up V1 structured output tests (#15718 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-28 21:10:45 -07:00
Alexander Matveev	c3f687ac22	[V1] TPU - Fix the chunked prompt bug (#15713 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-28 20:19:04 +00:00
Russell Bryant	7329ff5468	[V1] Support disable_any_whtespace for guidance backend (#15584 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-28 23:46:45 +08:00

... 5 6 7 8 9 ...

468 Commits