[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and prompt_logprobs with ChunkedPrefill (#10132)
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wallashss <wallashss@ibm.com> Co-authored-by: wallashss <wallashss@ibm.com>
This commit is contained in:
@@ -60,6 +60,7 @@ def test_scorer(model_name: str, batch_size: int, max_propose_len: int,
|
||||
num_gpu_blocks = 2048 // block_size
|
||||
scorer_worker = create_worker(Worker, model_name, block_size,
|
||||
num_gpu_blocks, seed)
|
||||
scorer_worker.model_runner.disable_logprobs = True # accessed by mqa_scorer
|
||||
scorer_worker.model_runner.model.sampler.include_gpu_probs_tensor = True
|
||||
scorer_worker.model_runner.model.sampler.\
|
||||
should_modify_greedy_probs_inplace = True
|
||||
|
||||
Reference in New Issue
Block a user