Thomas Parnell
|
c38eba3046
|
[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-10 09:04:07 -04:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Thomas Parnell
|
54600709b6
|
[Model] Changes to MLPSpeculator to support tie_weights and input_scale (#5965)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
2024-07-01 16:40:02 -07:00 |
|
Nick Hill
|
691e29ecf3
|
[BugFix] Fix MLPSpeculator handling of num_speculative_tokens (#5876)
|
2024-06-27 10:59:33 -07:00 |
|
Joshua Rosenkranz
|
b12518d3cf
|
[Model] MLPSpeculator speculative decoding support (#4947)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
|
2024-06-20 20:23:12 -04:00 |
|