vllm/benchmarks/benchmark_latency.py at dd793d1de59b5efad25f4794b68cb935824c7a11 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

Woo-Yeon Lee 2ce5d6688b [Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )

2024-06-25 09:56:06 +00:00

12 KiB

Raw Blame History

View Raw