1. DeepseekV4MLAAttention.__init__ had a hard assertion that the
attention backend MUST be FlashMLA. On Blackwell, FlashMLA doesn't
work but we bypass it via _attention_impl_blackwell(). Added
_is_blackwell flag to skip FlashMLA-specific init (fp8_ds_mla
cache format conversion).
2. Added VLLM_NVFP4_GEMM_BACKEND=cutedsl env var to docker-compose.yml
to force CuTeDSL kernel selection for NVFP4 linear layers.
3. Updated register_cutedsl_kernel.py to also register CuTeDSL in
_NVFP4_BACKEND_TO_KERNEL dict (for the env var override path).