docs: update debugging log in README

This commit is contained in:
2026-05-11 07:33:02 +00:00
parent 8d02eb38fa
commit acbe006498

View File

@@ -85,12 +85,18 @@ The `weight_scale_2` must be multiplied into the block scales **before** packing
## Remaining Work
- [ ] Test compilation on B200 (SM100)
- [ ] Verify UTCCP 4X column stride (i*8) — may need adjustment based on TMEM layout diagrams
- [ ] Verify SF packing: UE4M3 bytes → int32 layout matches what the UTCCP instruction expects
- [ ] Verify the L1 epilogue UE4M3 conversion (float → e4m3 cast + sign bit clear)
- [ ] Validate scale_format_ bit value: currently set by `make_instr_desc_block_scaled<float_ue4m3_t>` which sets scale_format_=0 (E4M3)
- [ ] Verify kNumSFATmemCols and kNumSFBTmemCols calculations for 4X layout
- [ ] Integration with vLLM DeepseekV4MegaMoEExperts class
- [ ] Weight loading: map NVFP4 checkpoint params to DeepseekV4MegaMoEExperts
- [ ] End-to-end quality test: compare NVFP4 mega_moe output vs FlashInfer FP4 MoE
- [x] Test compilation on B200 (SM100)**COMPILED**
- [ ] Verify UTCCP 4X column stride (i*8)
- [ ] Verify SF packing: UE4M3 → int32 → TMA-aligned layout
- [x] Add gran_k=16 to C++ transform_sf_into_required_layout
- [ ] Fix SF layout: must be MN-major (stride(-2)=1) with TMA-aligned stride
- [ ] Verify the L1 epilogue UE4M3 conversion
- [ ] Integration with vLLM DeepseekV4MegaMoEExperts — wired, debugging
- [ ] End-to-end quality test
### Debugging Log
- Build 7: kPackedFP4 mismatch → uint8→int8 view
- Build 9: SF stride assertion → need MN-major layout + TMA alignment
- Build 10: transform_sf_into_required_layout doesn't support gran_k=16 → C++ fix
- Build 11: SF dtype mismatch (float8_e4m3fn → must pack to int32 first)
- Build 12-14: SF stride layout — transpose to MN-major before transform