Commit Graph

11 Commits

Author SHA1 Message Date
b2d0417a46 NVFP4-1.1: Mark fp4_quant.py as toolchain-blocked, clean up test files
CuTeDSL MLIR pipeline cannot lower any float→int op. All approaches fail:
arith.fptosi, llvm.inline_asm, nvvm.inline_ptx, llvm.bitcast.

Production path: dsv4/kernels/cuda/quantize_nvfp4.cu (raw CUDA, works).
For NVFP4-1.1 fusion, use post-epilogue CUDA kernel approach.

Removed dead test files (test_ptx_*, test_fp4_isolate*, test_minimal_cmp*,
test_dtype_store, test_threshold_round).
2026-05-28 04:59:01 +00:00
b3eb46d4ec NVFP4-1.1: Restore threshold RNE approach — inline PTX blocked by toolchain
CuTeDSL MLIR pipeline cannot lower any float→int conversion:
arith.fptosi, llvm.inline_asm, nvvm.inline_ptx, llvm.bitcast — all
fail with 'LLVM ERROR: unsupported operation'. The pipeline has no
path from Float32 to Int32 MLIR types.

Threshold RNE is the mathematically correct software implementation:
- Float32 comparisons select Int32 *constants* (no arith.fptosi)
- > vs >= at .5 boundaries implements round-to-nearest-even
- Equivalent to PTX cvt.rni.s32.f32 for bounded ranges
2026-05-28 04:54:27 +00:00
e33c48e44c NVFP4-1.1: Use nvvm.inline_ptx instead of llvm.inline_asm for f32→i32
llvm.inline_asm fails with 'LLVM ERROR: unsupported operation' in CuTeDSL
lowering pipeline. Switch to nvvm.inline_ptx which is native to the NVVM
dialect and lowers correctly.

- f32_to_i32_rni: cvt.rni.s32.f32 via nvvm.inline_ptx
- f32_to_i32_rz: cvt.rzi.s32.f32 via nvvm.inline_ptx
- f32_to_i32_rmi: cvt.rmi.s32.f32 via nvvm.inline_ptx
2026-05-28 04:42:33 +00:00
1cbb3cf752 NVFP4-1.1: Replace threshold rounding with inline PTX cvt.rni/rz/rmi
- Add f32_to_i32_rni (cvt.rni.s32.f32) for round-to-nearest-even
- Add f32_to_i32_rz (cvt.rzi.s32.f32) for round-toward-zero
- Add f32_to_i32_rmi (cvt.rmi.s32.f32) for round-to-minus-infinity
- Replace round_rne_u0_8 and abs_scaled_to_e2m1_idx threshold hacks
  with proper PTX hardware rounding in fp8_e4m3_from_float32
- quantize_e2m1_nibble now uses f32_to_i32_rni + LUT logic for half_step
- Add test_ptx_convert.py for inline PTX conversion verification
- This is the CORRECT approach per NVFP4-1.1_INLINE_PTX_APPROACH.md option 1
2026-05-28 04:40:17 +00:00
d2aa93aad7 NVFP4-1.1: fix Int32 clamping — use comparisons instead of fmin/fmax (float-only ops) 2026-05-28 04:30:06 +00:00
dabcc415a8 NVFP4-1.1: threshold rounding for float-to-int — avoids CuTeDSL limitation
All float-to-int conversions replaced with threshold comparisons:
- round_rne_u0_8: mantissa rounding via Float32 comparisons → Int32 constants
- abs_scaled_to_e2m1_idx: direct |scaled| → E2M1 index (no half_step needed)
- Verified 0/500 trial failures against Python reference

Key thresholds (RNE boundaries):
- 0.25, 0.75, 1.25, 1.75, 2.75, 3.75, 5.25 with > vs >= for RNE tie-breaking
- Fixed: 2.75 must use >= (not >) to match round(5.5)=6 RNE
2026-05-28 04:26:40 +00:00
e565ebce91 NVFP4-1.1: replace cute.math.fmin with cute.arch.fmin (correct API) 2026-05-28 03:55:54 +00:00
20d5ddfa3d NVFP4-1.1: fix indentation for @cute.jit decorators 2026-05-28 03:52:46 +00:00
f6f59d34cb NVFP4-1.1: add @cute.jit decorator to fp4_quant functions for CuTeDSL if-block support 2026-05-28 03:50:11 +00:00
6f94925491 NVFP4-1.1: fix cute.math.fmax -> cute.arch.fmax (correct CuTeDSL API) 2026-05-28 03:48:51 +00:00
80b6b79f9e NVFP4-1.1: FP4 quantization primitives for CuTeDSL kernels
- fp8_e4m3_from_float32: manual FP8 E4M3 cast (bias=7, exp 0-15 valid,
  NaN guard for exp=15/mant=7, mantissa overflow handling)
- fp8_e4m3_to_float32: dequantize FP8 E4M3 bit pattern back to Float32
- half_step_to_e2m1_idx: E2M1 step mapping (0-12 → 0-7)
- quantize_e2m1_nibble: per-element E2M1 quantize + sign + pack
- Verified 0/500 trial failures against Python reference
- Key fixes discovered during validation:
  1. FP8 E4M3 bias is 7, NOT 8
  2. Exponent range is 0-15 (exp=15/mant=7 is NaN; others valid)
  3. Subnormal formula: val = m * 2^(-9) = m/512 (NOT m/1024)
  4. Round-to-nearest-even (not round-half-up) for half_step and mantissa
  5. Mantissa overflow (round to 8) must increment exponent
2026-05-28 03:39:55 +00:00