Commit Graph

575 Commits

Author SHA1 Message Date
Woosuk Kwon
51d3cb951d Remove max_num_seqs in latency benchmark script (#1855) 2023-11-30 00:00:32 -08:00
Woosuk Kwon
e74b1736a1 Add profile option to latency benchmark script (#1839) 2023-11-29 23:42:52 -08:00
Yanming W
e0c6f556e8 [Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00
Simon Mo
5ffc0d13a2 Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
Zhuofan
dcc543a298 [Minor] Fix comment (#1704) 2023-11-17 09:42:49 -08:00
Woosuk Kwon
660a7fcfa4 Add DeepSpeed MII backend to benchmark script (#1649) 2023-11-14 12:35:30 -08:00
chooper1
1f24755bf8 Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-10-21 23:14:59 -07:00
Woosuk Kwon
928de46888 Implement PagedAttention V2 (#1348) 2023-10-16 00:59:57 -07:00
Antoni Baum
acbed3ef40 Use monotonic time where appropriate (#1249) 2023-10-02 19:22:05 -07:00
kg6-sleipnir
b5a10eb0ef Added dtype arg to benchmarks (#1228) 2023-09-30 21:04:03 -07:00
Woosuk Kwon
e3e79e9e8a Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Ricardo Lu
8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) 2023-07-19 17:06:15 -07:00
WRH
cf21a9bd5c support trust_remote_code in benchmark (#518) 2023-07-19 17:02:40 -07:00
Woosuk Kwon
4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Zhuohan Li
43710e8d09 [Fix] Fix default port number in benchmark scripts (#265) 2023-06-26 13:15:35 -07:00
Zhuohan Li
0370afa2e5 Remove benchmark_async_llm_server.py (#155) 2023-06-19 11:12:37 +08:00
Woosuk Kwon
3f92038b99 Add comments on swap space (#154) 2023-06-18 11:39:35 -07:00
Woosuk Kwon
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Zhuohan Li
e5464ee484 Rename servers to engines (#152) 2023-06-17 17:25:21 +08:00
Woosuk Kwon
bab8f3dd0d [Minor] Fix benchmark_throughput.py (#151) 2023-06-16 21:00:52 -07:00
Zhuohan Li
eedb46bf03 Rename servers and change port numbers to reduce confusion (#149) 2023-06-17 00:13:02 +08:00
Woosuk Kwon
311490a720 Add script for benchmarking serving throughput (#145) 2023-06-14 19:55:38 -07:00
Zhuohan Li
1a956e136b Fix various issues of async servers (#135) 2023-06-05 23:44:50 +08:00
Woosuk Kwon
8274ca23ac Add docstrings for LLM (#137) 2023-06-04 12:52:41 -07:00
Woosuk Kwon
211318d44a Add throughput benchmarking script (#133) 2023-05-28 03:20:05 -07:00