Michael Goin
|
af59df0a10
|
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961)
|
2024-08-28 19:19:17 -04:00 |
|
Michael Goin
|
44f26a9466
|
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611)
|
2024-08-16 15:56:34 -07:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Michael Goin
|
07278c37dd
|
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611)
|
2024-07-26 14:33:42 -04:00 |
|
Robert Shaw
|
889da130e7
|
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-07-25 09:46:04 -07:00 |
|
Robert Shaw
|
4cc24f01b1
|
[ Kernel ] Enable Dynamic Per Token fp8 (#6547)
|
2024-07-19 23:08:15 +00:00 |
|
Robert Shaw
|
dbe5588554
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
Robert Shaw
|
aea19f0989
|
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356)
|
2024-07-12 11:11:29 -04:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|
Robert Shaw
|
75aa1442db
|
[ CI/Build ] LM Eval Harness Based CI Testing (#5838)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-06-29 13:04:30 -04:00 |
|