Files
nvfp4-megamoe-kernel/src
biondizzle ef9cd023a9 fix: unpack_ue4m3_u32 — uint32 lacks CUDA bitwise ops, use int32
PyTorch doesn't implement bitwise_and/shift for UInt32 on CUDA.
Cast to int32 first, then extract bytes, then uint8 → view float8.
2026-05-14 13:44:42 +00:00
..