Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle b3eb46d4ec NVFP4-1.1: Restore threshold RNE approach — inline PTX blocked by toolchain
CuTeDSL MLIR pipeline cannot lower any float→int conversion:
arith.fptosi, llvm.inline_asm, nvvm.inline_ptx, llvm.bitcast — all
fail with 'LLVM ERROR: unsupported operation'. The pipeline has no
path from Float32 to Int32 MLIR types.

Threshold RNE is the mathematically correct software implementation:
- Float32 comparisons select Int32 *constants* (no arith.fptosi)
- > vs >= at .5 boundaries implements round-to-nearest-even
- Equivalent to PTX cvt.rni.s32.f32 for bounded ranges
2026-05-28 04:54:27 +00:00
..