vllm/csrc/torch_bindings.cpp at 4f4d427ac2cee0f8ff7f79103001f6617fa8989c

Files

Tyler Michael Smith eb5741ad42 [Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (#12587 )

Integrates the block-quantized kernels introduced in
https://github.com/vllm-project/vllm/pull/11868 for use in linear
layers.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

2025-01-31 15:29:11 -08:00

21 KiB

Raw Blame History

View Raw

21 KiB Raw Blame History

21 KiB

Raw Blame History