[BugFix] skip language model in Encoder (#30242)

Signed-off-by: dengyunyang <584797741@qq.com>
2025-12-22 21:25:59 +08:00
parent 2cf91c2ea4
commit 8f8f469b1b
8 changed files with 116 additions and 3 deletions
--- a/examples/online_serving/disaggregated_encoder/README.md
+++ b/examples/online_serving/disaggregated_encoder/README.md
@@ -38,6 +38,8 @@ Encoder engines should be launched with the following flags:

 - `--max-num-batched-tokens=<large value>` **(default: 2048)** – This flag controls the token scheduling budget per decoding step and is irrelevant to encoder-only instances. **Set it to a very high value (effectively unlimited) to bypass scheduler limitations.** The actual token budget is managed by the encoder cache manager.

+- `--convert "mm_encoder_only"` **(Optional)** - The language model is skipped during initialization to reduce device memory usage. **Models using this option must implement the `get_language_model_spec` interface.**
+
 ## Local media inputs

 To support local image inputs (from your ```MEDIA_PATH``` directory), add the following flag to the encoder instance: