[BugFix] skip language model in Encoder (#30242)

Signed-off-by: dengyunyang <584797741@qq.com>
This commit is contained in:
dengyunyang
2025-12-22 21:25:59 +08:00
committed by GitHub
parent 2cf91c2ea4
commit 8f8f469b1b
8 changed files with 116 additions and 3 deletions

View File

@@ -38,6 +38,8 @@ Encoder engines should be launched with the following flags:
- `--max-num-batched-tokens=<large value>` **(default: 2048)** This flag controls the token scheduling budget per decoding step and is irrelevant to encoder-only instances. **Set it to a very high value (effectively unlimited) to bypass scheduler limitations.** The actual token budget is managed by the encoder cache manager.
- `--convert "mm_encoder_only"` **(Optional)** - The language model is skipped during initialization to reduce device memory usage. **Models using this option must implement the `get_language_model_spec` interface.**
## Local media inputs
To support local image inputs (from your ```MEDIA_PATH``` directory), add the following flag to the encoder instance: