biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Hiroaki Sugiyama	8958217ad5	[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211 ) Signed-off-by: h-sugi <h.sugi@ieee.org> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 22:29:29 +08:00
Chengji Yao	619d3de8bd	[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-26 22:46:26 -07:00
Cody Yu	54aa619459	[V1] Refactor num_computed_tokens logic (#15307 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 04:54:36 +00:00
Chengji Yao	e74ff409e0	[TPU] support disabling xla compilation cache (#15567 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-27 00:09:28 +00:00
Alexander Matveev	b2e85e26f4	[V1] TPU - Revert to exponential padding by default (#15565 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-26 21:35:05 +00:00
Chenyaaang	ac3cd6e83c	[core] add bucket padding to tpu_model_runner (#14995 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-25 17:27:22 -04:00
Nicolò Lucchesi	a0dd7dcd49	[TPU][V1] Fix Sampler recompilation (#15309 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-25 16:43:54 -04:00
Siyuan Liu	4157f563b4	[Hardware][TPU][Bugfix] Fix v1 mp profiler (#15409 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-25 01:43:00 -07:00
Woosuk Kwon	25f560a62c	[V1][Spec Decode] Update target_logits in place for rejection sampling (#15427 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-24 21:04:41 -07:00
Nick Hill	3aee6573dc	[V1] Aggregate chunked prompt logprobs in model runner (#14875 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-24 12:27:57 -04:00
Woosuk Kwon	b9bd76ca14	[V1][Spec Decode] Respect prompt_lookup_max (#15348 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-23 10:41:44 -07:00
shangmingc	50c9636d87	[V1][Usage] Refactor speculative decoding configuration and tests (#14434 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-22 19:28:10 -10:00
Chen Zhang	93a00d7dde	[v1] Refactor KVCacheConfig (#14079 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-03-21 04:56:27 -07:00
Siyuan Liu	b15fd2be2a	[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-21 03:05:28 +00:00
Woosuk Kwon	0c6f5023c3	[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-20 17:50:43 -07:00
Woosuk Kwon	2b22290ce0	[V1] Add flag to disable cascade attention (#15243 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-20 15:24:16 -07:00
Mickaël Seznec	a597a57595	[Attention] Flash Attention 3 - fp8 (#14570 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-03-20 01:14:20 -04:00
Nicolò Lucchesi	d8c6d7d6b5	[V1][TPU] Support V1 Sampler for ragged attention (#14227 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-19 21:00:39 -07:00
Woosuk Kwon	99abb8b650	[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 14:31:54 -07:00
Nicolò Lucchesi	af35d3a3cc	[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-18 07:34:45 -07:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
iefgnoix	b4ad56c1bd	[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-03-17 01:48:28 -07:00
Cyrus Leung	b539222d4e	[V1] Remove input cache client (#14864 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-16 23:42:06 -07:00
Lily Liu	8d6cf89526	[V1] [Spec Decode] Support random sampling for spec decode (#13933 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-16 22:00:20 -07:00
Woosuk Kwon	faa0275730	[V1] Optimize the overhead of rewinding (#14905 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-16 20:19:30 -07:00
Woosuk Kwon	31060b2757	[V1][BugFix] Detect interleaved sliding window attention (#14896 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-16 14:53:53 -07:00
Nick Hill	fc1f67715d	[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-16 14:53:34 -07:00
Roger Wang	ad19c8a003	[V1] Move OOM check into sampler run (#14728 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-03-13 20:40:23 -07:00
Benjamin Chislett	5c538c37b2	[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-11 22:12:41 -07:00
Cody Yu	b706d898af	[Bugfix][V1][PP] Only warmup sampler at last PP rank (#14643 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-11 23:40:07 +00:00
iefgnoix	863d315c86	[V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly (#14597 )	2025-03-11 19:12:26 -04:00
Roger Wang	1fc973c0b5	[V1][Core] Fix memory issue with logits & sampling (#14508 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>	2025-03-11 04:03:41 +00:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Chengji Yao	212007b168	[Hardware][TPU] Fix the recompiling issue in logits processor after warmup (#14510 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-09 05:44:39 -04:00
iefgnoix	10f7552789	[V1][TPU] Remove unnecessary padding for running on TPU. (#14467 )	2025-03-08 21:56:04 -05:00
Robert Shaw	5f0b53c6ea	Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-08 17:43:37 -08:00
22quinn	eb8b5eb183	[V1] Support bad_words in sampler (#13376 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-08 14:50:26 -08:00
Roger Wang	8d5aa466fb	[V1][Core] Fix memory issue with logits & sampling (#13776 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-08 06:11:04 -08:00
Alexander Matveev	cb8bdfade2	[V1] TPU - Add tensor parallel support via Ray (#13618 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-08 08:19:38 -05:00
Tyler Michael Smith	333681408f	[Bugfix][V1] Handle MLA in kv_cache_interface (#14462 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-07 22:18:25 -08:00
yarongmu-google	66e16a038e	[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-07 23:17:04 +00:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
Chengji Yao	0578e5a462	[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-06 23:31:05 +00:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Nick Hill	ac60dc7fe1	[V1][BugFix] Fix for mixed top_k batch (#14301 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com>	2025-03-05 20:43:04 +00:00
Nick Hill	a32c8669ca	[V1][Minor] Remove obsolete FIXME comment (#14304 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-05 11:59:23 -08:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Lu Fang	8d6cd32b7b	[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 08:49:44 +00:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00

1 2 3 4

165 Commits