Nick Hill
|
4e57c6587f
|
[Core] Support logprobs with spec decode + async scheduling (#29223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 12:55:24 -08:00 |
|
Jialin Ouyang
|
30b9c67743
|
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 21:27:45 -08:00 |
|
Cyrus Leung
|
98b4d389ed
|
[Redo] #26368 (#28771)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 22:47:41 -08:00 |
|
Nick Hill
|
ac86bff8cb
|
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773)
|
2025-11-14 20:24:00 -08:00 |
|
Jialin Ouyang
|
186352b270
|
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 16:04:04 -08:00 |
|
Giancarlo Delfin
|
6644796bf4
|
[V1][spec decode] return logprobs for spec decoding (#26060)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-22 22:59:59 -07:00 |
|
Pradyun92
|
acedc74b1a
|
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077)
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
|
2025-10-17 13:27:47 -07:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Nick Hill
|
ddcbc2f334
|
[Misc] Misc code simplifications (#26450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-09 02:10:06 -07:00 |
|
Sergei Skvortsov
|
6ebaf43ee4
|
[V1] Logit processors for rejection sampler (#19482)
Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-07 13:02:49 -07:00 |
|
Harry Mellor
|
b893d661b1
|
Fix per file ruff ignores related to simplification (#26259)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 20:31:53 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Corey Lowman
|
0879736aab
|
[Perf] Remove hardcoded num_warps=1 (#26183)
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
|
2025-10-03 20:38:50 +00:00 |
|
Ekagra Ranjan
|
e71b8e210d
|
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-25 15:22:03 -07:00 |
|
Wenlong Wang
|
032d661d27
|
[Docs] Fix warnings in mkdocs build (continued) (#25042)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-20 11:45:18 +00:00 |
|
Jingkai He
|
57d4ede520
|
[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) (#23829)
Signed-off-by: He-Jingkai <he-jingkai@outlook.com>
|
2025-08-28 19:05:20 +00:00 |
|
Woosuk Kwon
|
a3432f18fd
|
[BugFix][Spec Decode] Use float64 for uniform_probs (#23803)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-28 12:26:45 +00:00 |
|
Hyogeun Oh (오효근)
|
4e4d017b6f
|
[Docs] Fix warnings in mkdocs build (continued) (#23743)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
|
2025-08-27 17:17:29 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
Woosuk Kwon
|
41fb013d29
|
[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-23 14:57:43 -07:00 |
|
Woosuk Kwon
|
2bc4be4e32
|
[V1][Minor] Simplify rejection sampler's parse_output (#15741)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-29 09:25:17 -07:00 |
|
Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
Woosuk Kwon
|
25f560a62c
|
[V1][Spec Decode] Update target_logits in place for rejection sampling (#15427)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 21:04:41 -07:00 |
|
Woosuk Kwon
|
911c8eb000
|
[Minor][Spec Decode] Remove compiled_softmax (#15416)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 19:09:04 -07:00 |
|
Woosuk Kwon
|
ebcebeeb6b
|
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 17:16:46 -07:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Create Release / Create Release (push) Has been cancelled
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 22:00:20 -07:00 |
|
Woosuk Kwon
|
32ef4983cd
|
[V1] Temporarily disable FlashInfer Rejection Sampler (#14788)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-13 20:40:35 -07:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
Lily Liu
|
5629f26df7
|
[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729)
|
2025-02-25 18:14:48 -08:00 |
|
Nick Hill
|
30172b4947
|
[V1] Optimize handling of sampling metadata and req_ids list (#13244)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-18 12:15:33 -08:00 |
|
Woosuk Kwon
|
69e1d23e1e
|
[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-16 12:25:29 -08:00 |
|
Lily Liu
|
80f63a3966
|
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-15 18:05:11 -08:00 |
|