[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538)

Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-03-06 23:55:06 +00:00
parent c188749bcd
commit ce8546a12b
5 changed files with 430 additions and 41 deletions
--- a/docs/configuration/optimization.md
+++ b/docs/configuration/optimization.md
@@ -5,6 +5,17 @@ This guide covers optimization strategies and performance tuning for vLLM V1.
 !!! tip
    Running out of memory? Consult [this guide](./conserving_memory.md) on how to conserve memory.

+## Optimization Levels
+
+vLLM provides 4 optimization levels (`-O0`, `-O1`, `-O2`, `-O3`) that allow users to trade off startup time for performance:
+
+- `-O0`: No optimizations. Fastest startup time, but lowest performance.
+- `-O1`: Fast optimization. Simple compilation and fast fusions, and PIECEWISE cudagraphs.
+- `-O2`: Default optimization. Additional compilation ranges, additional fusions, FULL_AND_PIECEWISE cudagraphs.
+- `-O3`: Aggressive optimization. Currently equal to `-O2`, but may include additional time-consuming or experimental optimizations in the future.
+
+For more information, see the [optimization level documentation](../design/optimization_levels.md).
+
 ## Preemption

 Due to the autoregressive nature of transformer architecture, there are times when KV cache space is insufficient to handle all batched requests.