From 38caf7fa1a2f57a4a791ffa4ba0f9f55d409e0d0 Mon Sep 17 00:00:00 2001 From: Finbarr Timbers Date: Mon, 1 Dec 2025 12:15:19 -0700 Subject: [PATCH] Update FAQ on interleaving sliding windows support (#29796) Signed-off-by: Finbarr Timbers --- docs/contributing/model/basic.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/contributing/model/basic.md b/docs/contributing/model/basic.md index a68d1f016..d37501b86 100644 --- a/docs/contributing/model/basic.md +++ b/docs/contributing/model/basic.md @@ -113,8 +113,6 @@ See [this page](registration.md) for instructions on how to register your new mo ### How to support models with interleaving sliding windows? -For models with interleaving sliding windows (e.g. `google/gemma-2-2b-it` and `mistralai/Ministral-8B-Instruct-2410`), the scheduler will treat the model as a full-attention model, i.e., kv-cache of all tokens will not be dropped. This is to make sure prefix caching works with these models. Sliding window only appears as a parameter to the attention kernel computation. - To support a model with interleaving sliding windows, we need to take care of the following details: - Make sure the model's `config.json` contains `layer_types`.