Commit Graph

9 Commits

Author SHA1 Message Date
Lucas Wilkinson
e312e52b44 [Kernel] Add Exllama as a backend for compressed-tensors (#9395) 2024-10-17 09:48:26 -04:00
Lucas Wilkinson
86e9c8df29 [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-23 13:46:26 -04:00
Dipika Sikka
6cd5e5b07e [Misc] Fused MoE Marlin support for GPTQ (#8217) 2024-09-09 23:02:52 -04:00
Lucas Wilkinson
5288c06aa0 [Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174) 2024-08-20 07:09:33 -06:00
Lucas Wilkinson
a8d604ca2a [Misc] Disambiguate quantized types via a new ScalarType (#6396) 2024-08-02 13:51:58 -07:00
HandH1998
6512937de1 Support W4A8 quantization for vllm (#5218) 2024-07-31 07:55:21 -06:00
Michael Goin
0eb0757bef [Misc] Add ignored layers for fp8 quantization (#6657) 2024-07-23 14:04:04 -04:00
Alexander Matveev
396d92d5e0 [Kernel][Core] Add AWQ support to the Marlin kernel (#6612) 2024-07-21 19:41:42 -04:00
alexm-nm
5c342570d7 Add marlin unit tests and marlin benchmark script (#4815) 2024-05-16 09:36:49 -04:00