Cody Yu
|
973617ae02
|
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840)
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
|
2024-05-16 00:53:51 -07:00 |
|
Cody Yu
|
bc8ad68455
|
[Misc][Refactor] Introduce ExecuteModelData (#4540)
|
2024-05-03 17:47:07 -07:00 |
|
Nick Hill
|
efffb63f58
|
[Core] Move function tracing setup to util function (#4352)
|
2024-04-25 16:45:12 -07:00 |
|
DefTruth
|
d87f39e9a9
|
[Bugfix] Add init_cached_hf_modules to RayWorkerWrapper (#4286)
|
2024-04-23 09:28:35 -07:00 |
|
Nick Hill
|
8f2ea22bde
|
[Core] Some simplification of WorkerWrapper changes (#4183)
|
2024-04-23 07:49:08 +00:00 |
|
youkaichao
|
8a7a3e4436
|
[Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-18 16:15:12 -07:00 |
|
SangBin Cho
|
533d2a1f39
|
[Typing] Mypy typing part 2 (#4043)
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
|
2024-04-17 17:28:43 -07:00 |
|
youkaichao
|
8438e0569e
|
[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024)
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)
|
2024-04-17 08:34:33 +00:00 |
|
Cade Daniel
|
e95cd87959
|
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894)
|
2024-04-16 13:09:21 -07:00 |
|
Dylan Hawk
|
5c2e66e487
|
[Bugfix] More type hint fixes for py 3.8 (#4039)
|
2024-04-12 21:07:04 -07:00 |
|
Cade Daniel
|
e7c7067b45
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|