deepseek-v4-quant

biondizzle/deepseek-v4-quant

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	b5d14aa8b8	Add proper FP8→BF16 dequantization script Unlike the naive upcast, this properly dequantizes FP8 block-wise weights: bf16 = fp8_weight * scale_expanded (128x128 blocks). Also removes the now-unnecessary scale tensors and updates config. FP8Linear.forward() sees element_size() > 1 and falls back to F.linear().	2026-05-07 15:45:46 +00:00

Author

SHA1

Message

Date

biondizzle

b5d14aa8b8

Add proper FP8→BF16 dequantization script

Unlike the naive upcast, this properly dequantizes FP8 block-wise weights:
bf16 = fp8_weight * scale_expanded (128x128 blocks).

Also removes the now-unnecessary scale tensors and updates config.
FP8Linear.forward() sees element_size() > 1 and falls back to F.linear().

2026-05-07 15:45:46 +00:00

1 Commits