CRITICAL FIX: FP4 LUT was 4x too large!

E2M1 magnitudes are [0, 0.5, 1, 1.5, 2, 3, 4, 6] NOT [0, 2, 3, 4, 6, 8, 12, 24]. The old LUT was 4x the correct values, causing every NVFP4 dequantized weight to be 4x too large. This compounded across 61 layers, causing the residual stream to explode and producing gibberish output. This is the root cause of the residual growth and incoherent generation.
2026-05-31 04:16:13 +00:00
parent b8c8da91fe
commit aafe2eee12
1 changed files with 1 additions and 1 deletions
--- a/single_shot_inference.py
+++ b/single_shot_inference.py
@@ -53,7 +53,7 @@ NUM_GPUS = 8
 # NVFP4 dequantization — matches checkpoint format exactly
 # =====================================================================

-FP4_LUT = torch.tensor([0., 2., 3., 4., 6., 8., 12., 24.])
+FP4_LUT = torch.tensor([0., 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0])  # E2M1 magnitudes

 def dequant_nvfp4_weight(weight, weight_scale, weight_scale_2):
    """Dequantize NVFP4 weight to BF16.