Chen Zhang
|
994fc655b7
|
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003)
|
2025-01-15 07:55:30 +00:00 |
|
Roger Wang
|
91b361ae89
|
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 19:58:16 +00:00 |
|
Chen Zhang
|
8c3230d8c1
|
[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646)
|
2024-12-31 08:56:01 +00:00 |
|
sakunkun
|
2c5718809b
|
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565)
|
2024-12-31 06:29:04 +00:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
Cody Yu
|
78ed8f57d8
|
[Misc][V1] Fix type in v1 prefix caching (#11151)
|
2024-12-13 00:57:40 +00:00 |
|
Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
97814fbf0f
|
[v1] Refactor KVCacheManager for more hash input than token ids (#10507)
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-22 23:27:25 +00:00 |
|
Cyrus Leung
|
0b8bb86bf1
|
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-13 12:39:03 +00:00 |
|
Cody Yu
|
201fc07730
|
[V1] Prefix caching (take 2) (#9972)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-07 17:34:44 -08:00 |
|