diff --git a/README_NVFP4.md b/README_NVFP4.md
index 962b204..e93fdf9 100644
--- a/README_NVFP4.md
+++ b/README_NVFP4.md
@@ -94,9 +94,29 @@ The `weight_scale_2` must be multiplied into the block scales **before** packing
 - [ ] Integration with vLLM DeepseekV4MegaMoEExperts — wired, debugging
 - [ ] End-to-end quality test
 
+### SM100 (B200) Hardware Constraint
+
+**CRITICAL**: B200 (SM100) does NOT support `kind::mxf4nvf4` (neither `scale_vec::2X` nor `4X`). This instruction requires SM103 (B300) or SM120 (GB300). On SM100, the only FP4 block-scaled MMA is `kind::mxf8f6f4.block_scale` with UE8M0 scales (block32, group_size=32).
+
+**Strategy**: Keep NVFP4 E2M1 weights (same as MXFP4), convert UE4M3 block scales to UE8M0 for hardware compatibility. Merge NVFP4 block16→block32 (max of adjacent pairs). This is a scale format adaptation, not a weight format conversion.
+
+| Parameter | NVFP4 Checkpoint | Kernel (SM100 Adapted) |
+|-----------|-----------------|----------------------|
+| Weight format | E2M1 uint8 | E2M1 uint8 (unchanged) |
+| Block scale format | UE4M3 (float8_e4m3fn) | UE8M0 (uint8) |
+| Block size | 16 | 32 (merged) |
+| Global scale | float32 | Folded in before UE4M3→UE8M0 |
+| PTX instruction | N/A (requires SM103+) | mxf8f6f4.block_scale |
+
 ### Debugging Log
 - Build 7: kPackedFP4 mismatch → uint8→int8 view
 - Build 9: SF stride assertion → need MN-major layout + TMA alignment
 - Build 10: transform_sf_into_required_layout doesn't support gran_k=16 → C++ fix
 - Build 11: SF dtype mismatch (float8_e4m3fn → must pack to int32 first)
 - Build 12-14: SF stride layout — transpose to MN-major before transform
+- Build 15: SymmBuffer too small (NVFP4 has 2x SF) → use NVFP4 SymmBuffer
+- Build 16: ImportError (deep_gemm.mega.nvfp4) → Python wrapper
+- Build 17: NVCC error: scale_vec::4X not supported on sm_100f
+- Build 18: NVCC error: scale_vec::2X ALSO not supported on sm_100f
+- Build 19: kGranK still 16 in C++ binding
+- Build 20: Use mxf8f6f4 (same as MXFP4) with UE4M0 conversion