[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784)

Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
This commit is contained in:
Aurick Qiao
2025-11-03 09:23:31 -08:00
committed by GitHub
parent 4bc400f47e
commit 2c19d96777
8 changed files with 304 additions and 11 deletions

View File

@@ -48,6 +48,7 @@ buildkite-test-collector==0.1.9
genai_perf==0.0.8
tritonclient==2.51.0
arctic-inference == 0.1.0 # Required for suffix decoding test
numba == 0.61.2 # Required for N-gram speculative decoding
numpy
runai-model-streamer[s3,gcs]==0.15.0