vllm/vllm/model_executor/layers/quantization/utils at 6512937de1d7b4738938e0bb3004be86b6883729 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

HandH1998 6512937de1 Support W4A8 quantization for vllm (#5218 )

2024-07-31 07:55:21 -06:00

..

__init__.py

Add marlin unit tests and marlin benchmark script (#4815 )

2024-05-16 09:36:49 -04:00

marlin_utils_fp8.py

[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524 )

2024-07-25 09:46:04 -07:00

marlin_utils_test_24.py

[ Misc ] Refactor Marlin Python Utilities (#6082 )

2024-07-11 15:40:11 +00:00

marlin_utils_test_qqq.py

Support W4A8 quantization for vllm (#5218 )

2024-07-31 07:55:21 -06:00

marlin_utils_test.py

[Kernel][Core] Add AWQ support to the Marlin kernel (#6612 )

2024-07-21 19:41:42 -04:00

marlin_utils.py

[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795 )

2024-07-27 17:52:33 -04:00

quant_utils.py

Support W4A8 quantization for vllm (#5218 )

2024-07-31 07:55:21 -06:00

w8a8_utils.py

[Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842 )

2024-07-30 16:37:01 -04:00