Instead of fragile inline Dockerfile patching, just copy a modified utils.py (with _post_quant_fix call) into the image, same pattern as deepseek_v4.py and deepseek_v4_attention.py patches.
47 B
47 B