Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)

Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
This commit is contained in:
Lumosis
2026-01-08 01:00:07 -08:00
committed by GitHub
parent eac3b96ec0
commit b634e619bb
6 changed files with 75 additions and 20 deletions

View File

@@ -1573,7 +1573,13 @@ def create_scheduler_with_priority(
kv_cache_tensors=[],
kv_cache_groups=[
KVCacheGroupSpec(
["layer"], FullAttentionSpec(block_size, 1, 1, torch.float32, False)
["layer"],
FullAttentionSpec(
block_size=block_size,
num_kv_heads=1,
head_size=1,
dtype=torch.float32,
),
)
],
)