[Kernel] Raise verbose error and consolidate num_heads/num_kv_heads divisibility check (#19339)

Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
This commit is contained in:
22quinn
2025-06-14 22:43:48 -07:00
committed by GitHub
parent ee1531bc38
commit 0b73736a0d
17 changed files with 24 additions and 19 deletions

View File

@@ -114,7 +114,6 @@ class TritonAttentionImpl(AttentionImpl):
self.use_irope = use_irope
assert self.num_heads % self.num_kv_heads == 0
self.num_queries_per_kv = self.num_heads // self.num_kv_heads
support_head_sizes = TritonAttentionBackend.get_supported_head_sizes()