It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md Signed-off-by: mgoin <mgoin64@gmail.com>
3.8 MiB
2816x1536px
3.8 MiB
2816x1536px