Chendi.Xue
|
0a71900bc9
|
Remove hard-dependencies of Speculative decode to CUDA workers (#10587)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2024-11-26 17:57:11 -08:00 |
|
Lily Liu
|
c6bd70d772
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
Lily Liu
|
e6a26ed037
|
[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244)
|
2024-09-01 21:23:29 -07:00 |
|
Lily Liu
|
5c60c8c423
|
[SpecDecode] [Minor] Fix spec decode sampler tests (#7183)
|
2024-08-06 10:40:32 -07:00 |
|
Nick Hill
|
5cf9254a9c
|
[BugFix] Fix use of per-request seed with pipeline parallel (#6698)
|
2024-07-30 10:40:08 -07:00 |
|
Thomas Parnell
|
d4201e06d5
|
[Bugfix] Make spec. decode respect per-request seed. (#6034)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-07-18 19:22:08 -07:00 |
|
sroy745
|
80ca1e6a3a
|
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348)
|
2024-07-01 00:33:05 -07:00 |
|
sroy745
|
fa9e385229
|
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131)
|
2024-06-17 21:29:09 -05:00 |
|