Commit Graph

3 Commits

Author SHA1 Message Date
6db7fd339d FIX: (None,0,None,0) for ALL tma_partition outputs — verified shapes on B200
DIAG OUTPUT (n=256, inside @cute.kernel):
  tAgQ: (((64,128),1), Int32(?), Int32(?), Int32(?))  — 4 modes
  tBgK: (((64,128),1), Int32(?), Int32(?), Int32(?))  — 4 modes
  tVgV: (((64,128),1), 1, 1, 1)                       — 4 modes

After (None,0,None,0) → keeps modes 0 and 2 free → 2D:
  tAgQ: (((64,128),1), Int32(?))
  tBgK: (((64,128),1), Int32(?))
  tVgV: (((64,128),1), 1)

Then [None, kt] indexes the surviving mode 1 (originally mode 2 = KV tiles).
tAgQ[(None, Int32(0))] for Q (1 tile, coordinate is always 0).
Removed diag prints from test_fmha_v3.py.
2026-05-22 23:35:55 +00:00
a50cb138c8 auto: pre-test commit 2026-05-22 23:34:03 +00:00
9cbdc92744 Restructure: cutedsl/ -> dsv4/ with proper layering
- Split bridge.py -> ops/quantize.py, ops/layouts.py, ops/gemm_runner.py
- Renamed classes: CuTeDSLNvfp4Linear -> Nvfp4Linear, etc.
- Moved kernel code to dsv4/kernels/ (gemm, attention, compressor, decode, cuda)
- Moved PyTorch bridges to dsv4/ops/
- Moved nn.Module layers to dsv4layers/
- Moved reference implementations to dsv4/reference/
- Moved vendored CUTLASS code to vendored/
- Archived ~190 debug tests to tests/archive/
- Kept ~15 canonical tests in tests/unit/
- Updated all import paths
- Added stubs for future components (model/, cache/, loader/)
- Updated pyproject.toml: dsv4-inference package name
2026-05-21 17:30:44 +00:00