Verified that our NVFP4 packing convention (odd<<4|even, round-half-to-even)
matches the DeepSeek-V4 checkpoint exactly: 100% byte-identical round-trip
across all tested experts. The dequantize->requantize path is lossless in
practice but wasteful. Marked both prepare_weights_from_dequantized and
prepare_weights_direct as deprecated in favor of prepare_weights_from_stacked
which loads checkpoint FP4 bytes directly via .view().
Also added test_fp4_roundtrip.py for future reference.