[Doc] Update more docs with respect to V1 (#29188)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -1,21 +1,23 @@
|
||||
# Reproducibility
|
||||
|
||||
vLLM does not guarantee the reproducibility of the results by default, for the sake of performance. To achieve
|
||||
reproducible results, you need to turn off multiprocessing to make the scheduling deterministic by setting `VLLM_ENABLE_V1_MULTIPROCESSING=0`.
|
||||
reproducible results:
|
||||
|
||||
- In offline mode, you can either set `VLLM_ENABLE_V1_MULTIPROCESSING=0` which makes scheduling deterministic,
|
||||
or enable [batch invariance](../features/batch_invariance.md) to make the outputs insensitive to scheduling.
|
||||
- In online mode, you can only enable [batch invariance](../features/batch_invariance.md).
|
||||
|
||||
Example: [examples/offline_inference/reproducibility.py](../../examples/offline_inference/reproducibility.py)
|
||||
|
||||
!!! warning
|
||||
|
||||
Applying the above settings [changes the random state in user code](#locality-of-random-state).
|
||||
Setting `VLLM_ENABLE_V1_MULTIPROCESSING=0` will change the random state of user code
|
||||
(i.e. the code that constructs [LLM][vllm.LLM] class).
|
||||
|
||||
!!! note
|
||||
|
||||
Even with the above settings, vLLM only provides reproducibility
|
||||
when it runs on the same hardware and the same vLLM version.
|
||||
Also, the online serving API (`vllm serve`) does not support reproducibility
|
||||
because it is almost impossible to make the scheduling deterministic in the
|
||||
online setting.
|
||||
|
||||
## Setting the global seed
|
||||
|
||||
@@ -23,25 +25,17 @@ The `seed` parameter in vLLM is used to control the random states for various ra
|
||||
|
||||
If a specific seed value is provided, the random states for `random`, `np.random`, and `torch.manual_seed` will be set accordingly.
|
||||
|
||||
However, in some cases, setting the seed will also [change the random state in user code](#locality-of-random-state).
|
||||
|
||||
### Default Behavior
|
||||
|
||||
In V1, the `seed` parameter defaults to `0` which sets the random state for each worker, so the results will remain consistent for each vLLM run even if `temperature > 0`.
|
||||
|
||||
It is impossible to un-specify a seed for V1 because different workers need to sample the same outputs
|
||||
for workflows such as speculative decoding. For more information, see: <https://github.com/vllm-project/vllm/pull/17929>
|
||||
|
||||
!!! note
|
||||
|
||||
It is impossible to un-specify a seed for V1 because different workers need to sample the same outputs
|
||||
for workflows such as speculative decoding.
|
||||
|
||||
For more information, see: <https://github.com/vllm-project/vllm/pull/17929>
|
||||
The random state in user code (i.e. the code that constructs [LLM][vllm.LLM] class) is updated by vLLM
|
||||
only if the workers are run in the same process as user code, i.e.: `VLLM_ENABLE_V1_MULTIPROCESSING=0`.
|
||||
|
||||
### Locality of random state
|
||||
|
||||
The random state in user code (i.e. the code that constructs [LLM][vllm.LLM] class) is updated by vLLM under the following conditions:
|
||||
|
||||
- For V0: The seed is specified.
|
||||
- For V1: The workers are run in the same process as user code, i.e.: `VLLM_ENABLE_V1_MULTIPROCESSING=0`.
|
||||
|
||||
By default, these conditions are not active so you can use vLLM without having to worry about
|
||||
accidentally making deterministic subsequent operations that rely on random state.
|
||||
By default, `VLLM_ENABLE_V1_MULTIPROCESSING=1` so you can use vLLM without having to worry about
|
||||
accidentally making deterministic subsequent operations that rely on random state.
|
||||
|
||||
Reference in New Issue
Block a user