[torch.compile] Document the workaround to standalone_compile failing (#33571)

Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-02-02 23:16:55 -08:00
parent b95cc5014d
commit fd9c83d0e0
2 changed files with 28 additions and 0 deletions
--- a/docs/design/debug_vllm_compile.md
+++ b/docs/design/debug_vllm_compile.md
@@ -282,6 +282,15 @@ If vLLM's compile cache is wrong, this usually means that a factor is missing.
 Please see [this example](https://github.com/vllm-project/vllm/blob/18b39828d90413d05d770dfd2e2f48304f4ca0eb/vllm/config/model.py#L310)
 of how vLLM computes part of the cache key.

+vLLM's compilation cache requires that the code being compiled ends up being serializable.
+If this is not the case, then it will error out on save. Usually the fixes are to either:
+
+- rewrite the non-serializable pieces (perhaps difficult because it's difficult to
+  tell right now what is serializable and what isn't)
+- file a bug report
+- ignore the error by setting `VLLM_DISABLE_COMPILE_CACHE=1` (note that this will
+  make warm server starts a lot slower).
+
 ## Debugging CUDAGraphs

 CUDAGraphs is a feature that allows one to: