[Model] Interface to enable batch-level DP support (#23733)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Cyrus Leung
2025-08-27 21:41:22 +08:00
committed by GitHub
parent 16dc4052b0
commit fe8d7b6f03
8 changed files with 38 additions and 4 deletions

View File

@@ -168,8 +168,11 @@ llm = LLM(
Batch-level DP is not to be confused with API request-level DP
(which is instead controlled by `data_parallel_size`).
The availability of batch-level DP is based on model implementation.
Currently, the following models support `mm_encoder_tp_mode="data"`:
Batch-level DP needs to be implemented on a per-model basis,
and enabled by setting `supports_encoder_tp_data = True` in the model class.
Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to use this feature.
Known supported models:
- Llama4 (<gh-pr:18368>)
- MiniCPM-V-4 (<gh-pr:23327>)