Michael Goin
|
17b4d85f63
|
[CI][TPU] Skip structured outputs+spec decode tests on TPU (#17510)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-30 20:36:20 -07:00 |
|
Siyuan Liu
|
dbc18e7816
|
[CI][TPU] Skip Multimodal test (#17488)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-04-30 19:51:39 -07:00 |
|
Chen Zhang
|
81ecf425f0
|
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-30 18:25:53 +00:00 |
|
Russell Bryant
|
947f2f5375
|
[V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-30 16:10:54 +00:00 |
|
Alec
|
0be6d05b5e
|
[V1][Metrics] add support for kv event publishing (#16750)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-30 07:44:45 -07:00 |
|
Marko Rosenmueller
|
77073c77bc
|
[Core] Prevent side-channel attacks via cache salting (#17045)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-04-30 20:27:21 +08:00 |
|
Nicolò Lucchesi
|
a7d5b016bd
|
[TPU][V1][CI] Update regression test baseline for v6 CI (#17064)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-30 04:03:22 -07:00 |
|
Benjamin Chislett
|
34120f5acd
|
[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-04-30 00:02:10 +00:00 |
|
Harry Mellor
|
7489ec0bab
|
Remove Bamba 9B from CI (#17407)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 21:10:31 +00:00 |
|
Harry Mellor
|
a6977dbd15
|
Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 19:02:23 +00:00 |
|
ponix-j
|
bdb2cddafc
|
[Misc]Use a platform independent interface to obtain the device attributes (#17100)
|
2025-04-29 06:59:13 +00:00 |
|
Michał Moskal
|
86d9fc29cb
|
implement Structural Tag with Guidance backend (#17333)
Signed-off-by: Michal Moskal <michal@moskal.me>
|
2025-04-29 02:21:32 +00:00 |
|
Lily Liu
|
20e489eaa1
|
[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-27 09:29:43 -07:00 |
|
Ning Xie
|
fd11a325b8
|
[MISC] rename interval to max_recent_requests (#14285)
|
2025-04-26 16:59:18 +00:00 |
|
Russell Bryant
|
f8acd01ff7
|
[V1] Add structural_tag support using xgrammar (#17085)
|
2025-04-26 14:06:37 +00:00 |
|
Nick Hill
|
df6f3ce883
|
[Core] Remove prompt string from engine core data structures (#17214)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 23:41:05 -07:00 |
|
Woosuk Kwon
|
513f074766
|
[CI/test] Fix Eagle Correctness Test (#17209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 23:40:36 -07:00 |
|
Nick Hill
|
b07bf83c7d
|
[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-26 06:00:07 +00:00 |
|
Zijing Liu
|
53e8cf53a4
|
[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:05:40 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Sangyeon Cho
|
6aae216b4e
|
[Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
|
2025-04-25 06:54:43 +00:00 |
|
Mark McLoughlin
|
340d7b1b21
|
[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-24 08:57:40 -07:00 |
|
Rui Qiao
|
c0dfd97519
|
[V1][PP] Optimization: continue scheduling prefill chunks (#17080)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-24 05:27:08 -07:00 |
|
Harry Mellor
|
0a05ed57e6
|
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
|
Michael Goin
|
14288d1332
|
Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 02:50:09 -07:00 |
|
Travis Johnson
|
3cde34a4a4
|
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-23 18:34:41 +00:00 |
|
Nick Hill
|
1e013fa388
|
[V1][DP] More robust DP/EP dummy request coordination (#16277)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-22 19:12:15 -07:00 |
|
Chenyaaang
|
83d933718c
|
[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-22 18:05:23 -06:00 |
|
Woosuk Kwon
|
c4ab9f3e71
|
[V1] Remove pre-allocation for KV cache (#16941)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-22 00:52:18 -07:00 |
|
Chauncey
|
acba33a0f1
|
[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-22 06:02:20 +00:00 |
|
Jeffrey Li
|
0e4254492f
|
[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863)
Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>
|
2025-04-22 11:40:19 +08:00 |
|
Nicolò Lucchesi
|
fa3bba2a53
|
[TPU][V1] Enable Top-P (#16843)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-22 00:46:07 +00:00 |
|
Michael Goin
|
986537f1c3
|
[V1] V1 FlashInfer Attention (#16684)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Aurick Qiao <qiao@aurick.net>
|
2025-04-22 00:38:41 +00:00 |
|
Nicolò Lucchesi
|
210207525e
|
[TPU][V1] Capture multimodal encoder during model compilation (#15051)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
|
2025-04-21 18:36:59 -06:00 |
|
Chengji Yao
|
471fe65630
|
[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-21 15:43:13 -06:00 |
|
Woosuk Kwon
|
3a0fba5cf4
|
[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-21 12:38:50 -07:00 |
|
qizixi
|
bb3605db85
|
[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-04-20 20:54:29 -07:00 |
|
Staszek Paśko
|
87aaadef73
|
Serialize tensors using int8 views (#16866)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-19 10:28:34 -07:00 |
|
vie-serendipity
|
d9737ca1c6
|
[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460)
Signed-off-by: vie-serendipity <2733147505@qq.com>
|
2025-04-19 02:25:19 -07:00 |
|
Yihua Cheng
|
3408e47159
|
[P/D][V1] KV Connector API V1 (#15960)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-04-17 13:22:40 -07:00 |
|
Nicolò Lucchesi
|
eb5819b2d9
|
[V1][TPU] Enable Top K (#15489)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-04-17 18:18:11 +00:00 |
|
Nicolò Lucchesi
|
5989f4684d
|
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even (#16726)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-17 18:09:57 +00:00 |
|
Robert Shaw
|
2b05b8ce69
|
[V1][Frontend] Improve Shutdown And Logs (#11737)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:48:34 -07:00 |
|
Staszek Paśko
|
3092375e27
|
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:28:32 -07:00 |
|
Shanshan Shen
|
976711d9db
|
[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py (#16578)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-16 17:01:36 +08:00 |
|
Nicolò Lucchesi
|
b3f2fddd17
|
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 (#16596)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-14 17:01:05 +00:00 |
|
Lily Liu
|
f49e5aff11
|
[V1][Spec Decode] KV cache slots for eagle heads (#16370)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-12 19:42:51 -07:00 |
|
leon-seidel
|
e92d7085bf
|
[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516)
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
|
2025-04-11 23:22:07 -07:00 |
|
Nick Hill
|
41cc883c29
|
[BugFix] Handle non-contiguous tensors properly when serializing (#16492)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 17:54:06 -07:00 |
|
Michael Goin
|
aa3b3d76e0
|
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 08:09:52 +00:00 |
|