[UX] Add --moe-backend arg for explicit kernel selection (#33807)
Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
This commit is contained in:
@@ -8,5 +8,4 @@ server_args: >-
|
||||
--tensor-parallel-size 2
|
||||
--enable-expert-parallel
|
||||
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}'
|
||||
env:
|
||||
VLLM_USE_FLASHINFER_MOE_FP4: "1"
|
||||
--moe-backend=flashinfer_trtllm
|
||||
|
||||
Reference in New Issue
Block a user