Robert Shaw
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
Joe Runde
|
d58268c56a
|
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-06 11:57:35 -08:00 |
|
Gene Der Su
|
7a83b1aec0
|
[BugFix] Lazy import ray (#10021)
|
2024-11-05 10:04:10 +00:00 |
|
Robert Shaw
|
04cef2c6ab
|
[Bugfix] Fix MQLLMEngine hanging (#9973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-11-04 16:01:43 -05:00 |
|
Gene Der Su
|
27cd36e6e2
|
[Bugfix] PicklingError on RayTaskError (#9934)
Signed-off-by: Gene Su <e870252314@gmail.com>
|
2024-11-01 22:08:23 +00:00 |
|
youkaichao
|
18bd7587b7
|
[1/N] pass the complete config from engine to executor (#9933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 13:51:57 -07:00 |
|
Joe Runde
|
3b3f1e7436
|
[Bugfix][core] replace heartbeat with pid check (#9818)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-30 09:34:07 -07:00 |
|
Woosuk Kwon
|
6c5af09b39
|
[V1] Implement vLLM V1 [1/N] (#9289)
|
2024-10-22 01:24:07 -07:00 |
|
Russell Bryant
|
776dbd74f1
|
[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-16 22:55:59 +00:00 |
|
Sebastian Schoennenbeck
|
df3dcdf49d
|
[Bugfix] Fix priority in multiprocessing engine (#9277)
|
2024-10-11 15:35:35 +00:00 |
|
Cyrus Leung
|
e808156f30
|
[Misc] Collect model support info in a single process per model (#9233)
|
2024-10-11 11:08:11 +00:00 |
|
Cyrus Leung
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
Simon Mo
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
科英
|
64840dfae4
|
[Frontend] MQLLMEngine supports profiling. (#8761)
|
2024-09-25 09:37:41 -07:00 |
|
Cyrus Leung
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
Joe Runde
|
6e0c9d6bd0
|
[Bugfix] Use heartbeats instead of health checks (#8583)
|
2024-09-24 20:37:38 -07:00 |
|
Simon Mo
|
3185fb0cca
|
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750)
|
2024-09-24 05:45:20 +00:00 |
|
Alexander Matveev
|
1a2aef3e59
|
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)
|
2024-09-23 15:38:04 -07:00 |
|
Cyrus Leung
|
0057894ef7
|
[Core] Rename PromptInputs and inputs(#8673)
|
2024-09-20 19:00:54 -07:00 |
|
Nick Hill
|
76515f303b
|
[Frontend] Use MQLLMEngine for embeddings models too (#8584)
|
2024-09-19 12:51:06 -04:00 |
|
Alexander Matveev
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|