biondizzle
0cdcc4144a
refactor: add cutedsl/bridge.py, rewrite layertest to use it
bridge.py: clean API for CuTeDSL kernel
- quantize_to_nvfp4 / quantize_weight_to_nvfp4
- assemble_scales_2d_side / assemble_scales_3d_side
- make_b_k_major (stride conversion)
- compute_expert_offsets
- run_nvfp4_grouped_gemm (full kernel launch)
layertest.py: now uses bridge layer, tests with real
DeepSeek-V4 layer 0 weights (7168 hidden, 6144 intermediate).
The bridge code will be reused by the vLLM integration layer.
2026-05-16 03:13:54 +00:00
..
2026-05-16 03:13:54 +00:00
2026-05-16 02:13:18 +00:00
2026-05-16 02:21:17 +00:00
2026-05-16 02:14:37 +00:00
2026-05-16 03:04:31 +00:00
2026-05-16 02:14:37 +00:00
2026-05-16 02:14:37 +00:00