Cyrus Leung
|
766bc8162c
|
[Core] Store only the keys for multi-modal data in P0 (#22198)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 01:45:04 -07:00 |
|
lkchen
|
f4f4e7ef27
|
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-08-04 19:11:33 -07:00 |
|
Woosuk Kwon
|
7175817637
|
Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223)
|
2025-08-04 18:37:06 -07:00 |
|
PiteXChen
|
2dffac464c
|
[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173)
Signed-off-by: CLFutureX <775523362@qq.com>
|
2025-08-04 18:34:10 -07:00 |
|
David Ben-David
|
aefeea0fde
|
[V1] [P/D] Refactor KV Connector Path (#21980)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-08-03 04:03:40 -07:00 |
|
Zebing Lin
|
e0f63e4a35
|
[Core] Avoid repeated len(block_token_ids) check in hash_request_tokens (#21781)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-01 00:23:29 -07:00 |
|
Ruixiang Tan
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
MingzhenHan
|
b7b23da4d2
|
[Bugfix] Fix comment typo of get_num_common_prefix_blocks() (#21827)
Signed-off-by: MingzhenHan <hanmingzhen2002@outlook.com>
|
2025-07-29 20:35:33 -07:00 |
|
Chen Zhang
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|
Kuntai Du
|
b18b417fbf
|
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-07-28 20:15:18 +00:00 |
|
Adeline
|
15a72ac478
|
[V1] Exception Handling when Loading KV Cache from Remote Store (#21534)
Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
|
2025-07-27 20:34:17 -07:00 |
|
Zhou Fang
|
fc5f756db4
|
[v1][Core] Clean up usages of SpecializedManager (#21407)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
|
2025-07-24 00:40:11 -07:00 |
|
Raushan Turganbay
|
f38ee34a0a
|
[feat] Enable mm caching for transformers backend (#21358)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-07-22 08:18:46 -07:00 |
|
Jialin Ouyang
|
ed25054577
|
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool (#21222)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-22 06:17:47 -07:00 |
|
Simon Mo
|
32142b3c62
|
[Bugfix] Fix eviction cached blocked logic (#21357)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-22 01:18:40 -07:00 |
|
Jialin Ouyang
|
af376ca19d
|
[Core] Minimize number of dict lookup in _maybe_evict_cached_block (#21281)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-21 22:37:34 -07:00 |
|
Lucia Fang
|
9a9fda1423
|
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
|
2025-07-18 20:48:38 -07:00 |
|
JialinOuyang-Meta
|
0f199f197b
|
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
|
2025-07-18 12:34:40 -07:00 |
|
Lucas Wilkinson
|
89cab4d01f
|
[Attention] Make local attention backend agnostic (#21093)
|
2025-07-18 00:10:42 -04:00 |
|
Christian Pinto
|
4ffd963fa0
|
[v1][core] Support for attention free models (#20811)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-15 14:20:01 +00:00 |
|
Woosuk Kwon
|
d4d309409f
|
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-14 23:01:46 -07:00 |
|
Maroon Ayoub
|
66f6fbd393
|
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
|
2025-07-14 02:45:31 +00:00 |
|
nopperl
|
4bbfc36b16
|
[V1] Hybrid allocator without prefix caching (#20661)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-07-13 16:55:14 +00:00 |
|
Woosuk Kwon
|
f45a332886
|
[Sched] Enhance the logic to remove stopped requests from queues (#20739)
|
2025-07-12 15:33:13 -07:00 |
|
Woosuk Kwon
|
7c12a765aa
|
[Misc] Simplify the prefix caching logic on draft tokens (#20701)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-09 14:48:35 -07:00 |
|
Woosuk Kwon
|
31c5d0a1b7
|
[Optimize] Don't send token ids when kv connector is not used (#20586)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-07 19:04:54 -07:00 |
|
Peter Pan
|
edd270bc78
|
[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-07 09:41:15 -07:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
Woosuk Kwon
|
7f280d69c9
|
[Optimization] Cache sampled token ids in model runner (#20291)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 11:01:31 -07:00 |
|
Woosuk Kwon
|
0e96cc9b7e
|
[Misc] Minor refactoring for scheduler (#20299)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 07:55:32 -07:00 |
|
Woosuk Kwon
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
amit
|
4a0f7888a3
|
[Core] feat: Implement Priority Scheduling in V1 Engine (#19057)
Signed-off-by: amit <amit.man@gmail.com>
Co-authored-by: Roger Wang <Rogerw0108@gmail.com>
|
2025-06-22 20:18:08 -07:00 |
|
Vlad Tiberiu Mihailescu
|
2e3e3c86dc
|
Export NaNs in logits to scheduler_stats if output is corrupted (#18777)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2025-06-20 22:47:16 +08:00 |
|
Maximilien de Bayser
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
Chen Zhang
|
a89209b78d
|
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-18 20:34:15 +00:00 |
|
Russell Bryant
|
5f52a84685
|
[V1] Add API docs for EncoderCacheManager (#19294)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 13:37:01 +08:00 |
|
Saheli Bhattacharjee
|
d1e34cc9ac
|
[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354)
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
|
2025-06-14 11:07:36 +08:00 |
|
Nick Hill
|
7e8d97dd3f
|
[BugFix] Honor enable_caching in connector-delayed kvcache load case (#19435)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-13 09:46:32 +00:00 |
|
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
Nick Hill
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
|
Nicolò Lucchesi
|
b6a3a9f76d
|
[Core] Fix abrupt request abort (#18485)
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:27:59 -07:00 |
|
Nick Hill
|
aad30bd306
|
[BugFix] Fix MultiConnector test after HMA changes (#19291)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 20:16:24 +00:00 |
|
Nick Hill
|
65c69444b1
|
[Docs] Improve V1 KVConnector interface documentation (#19172)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:22:45 +08:00 |
|
Jinghui Zhang
|
90b78ec5f9
|
[v1][P/D] Fix a edge case in kv cache schedule (#19182)
Co-authored-by: jinghui <jinghui@fb.com>
|
2025-06-05 23:32:55 -07:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Robert Shaw
|
c56ed8bb0e
|
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-05 02:07:32 +00:00 |
|
Yan Ru Pei
|
b712be98c7
|
feat: add data parallel rank to KVEventBatch (#18925)
|
2025-06-03 17:14:20 -07:00 |
|
Chen Zhang
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
|
Chen Zhang
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
|