Ricky Xu
|
132a132100
|
[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2025-01-21 11:51:13 -08:00 |
|
Chen Zhang
|
69d765f5a5
|
[V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-01-17 07:39:35 +00:00 |
|
Chen Zhang
|
994fc655b7
|
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003)
|
2025-01-15 07:55:30 +00:00 |
|
Robert Shaw
|
619ae268c3
|
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-13 04:54:10 +00:00 |
|
Robert Shaw
|
9597a095f2
|
[V1][Core][1/n] Logging and Metrics (#11962)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-12 21:02:02 +00:00 |
|
Chen Zhang
|
cf5f000d21
|
[torch.compile] Hide KV cache behind torch.compile boundary (#11677)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-10 13:14:42 +08:00 |
|
Roger Wang
|
91b361ae89
|
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 19:58:16 +00:00 |
|
Rui Qiao
|
022c5c6944
|
[V1] Refactor get_executor_cls (#11754)
|
2025-01-06 07:59:16 +00:00 |
|
Jee Jee Li
|
47831430cc
|
[Bugfix][V1] Fix test_kv_cache_utils.py (#11738)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-04 16:07:59 +00:00 |
|
xcnick
|
d91457d529
|
[V1] Add kv cache utils tests. (#11513)
Signed-off-by: xcnick <xcnick0412@gmail.com>
|
2025-01-04 14:49:46 +08:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Woosuk Kwon
|
73001445fb
|
[V1] Implement Cascade Attention (#11635)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-01 21:56:46 +09:00 |
|
Chen Zhang
|
8c3230d8c1
|
[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646)
|
2024-12-31 08:56:01 +00:00 |
|
sakunkun
|
2c5718809b
|
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565)
|
2024-12-31 06:29:04 +00:00 |
|
Robert Shaw
|
4fb8e329fd
|
[V1] [5/N] API Server: unify Detokenizer and EngineCore input (#11545)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-28 20:51:57 +00:00 |
|
Robert Shaw
|
df04dffade
|
[V1] [4/N] API Server: ZMQ/MP Utilities (#11541)
|
2024-12-28 01:45:08 +00:00 |
|
Woosuk Kwon
|
371d04d39b
|
[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:32:38 +09:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
Cody Yu
|
78ed8f57d8
|
[Misc][V1] Fix type in v1 prefix caching (#11151)
|
2024-12-13 00:57:40 +00:00 |
|
Alexander Matveev
|
4e11683368
|
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 00:55:30 +00:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
Ricky Xu
|
97814fbf0f
|
[v1] Refactor KVCacheManager for more hash input than token ids (#10507)
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-22 23:27:25 +00:00 |
|
Cyrus Leung
|
0b8bb86bf1
|
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-13 12:39:03 +00:00 |
|
Woosuk Kwon
|
112fa0bbe5
|
[V1] Fix CI tests on V1 engine (#10272)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 16:17:20 -08:00 |
|
Robert Shaw
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
Cody Yu
|
201fc07730
|
[V1] Prefix caching (take 2) (#9972)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-07 17:34:44 -08:00 |
|