9d88769f5f
Wire indexer compute_index_scores_topk + fix compressor imports
...
- indexer/__init__.py: compute_index_scores_topk now calls
run_indexer_score_topk with proper tensor reshaping
- compressor/__init__.py: added torch import, fixed csa_compress_tail
and hca_compress_tail imports for flush.py
- Full flush pipeline now importable end-to-end
2026-05-30 21:19:06 +00:00
daf84524ac
E2/E3: compressor bridge, indexer bridge, flush pipeline wiring
...
- compress_tail.py: PyTorch reference CSA/HCA compression
(token-level softmax over m/m' entries, paper eq. 11-12)
- compressor/__init__.py: csa_compress_and_store, hca_compress_and_store
bridges (compression deferred to flush pipeline)
- indexer/__init__.py: compute_index_scores_topk bridge (NotImplemented)
- Fixed attention.py: removed extra positions arg to write_swa
2026-05-30 21:16:54 +00:00
300dddedc0
E1-E4: gather kernels, handle wiring, rope, sync removal, e2e test
...
E1: LayerCacheHandle now exposes gather_compressed_kv,
gather_all_compressed_kv, gather_swa_kv, num_query_heads, head_dim.
Gather kernels in dsv4/kernels/cuda/gather_swa.cu + gather_kv.cu.
Python wrapper in dsv4/kernels/cache/gather.py.
E2: tests/e2e/test_one_layer.py — SWA path smoke test.
E3: Compressor/indexer __init__.py bridges (NotImplementedError stubs
for CSA/HCA compress_and_store, compute_index_scores_topk).
E4: Removed torch.cuda.synchronize() from fmha_multitile_op.py fast path.
Error checking via C API return code instead.
Also: forward_rope_partial in ops/rope.py (GPT-J interleaved, last 64 dims).
2026-05-30 21:10:26 +00:00
3fb3c925af
Restructure: cutedsl/ -> dsv4/ with proper layering
...
- Split bridge.py -> ops/quantize.py, ops/layouts.py, ops/gemm_runner.py
- Renamed classes: CuTeDSLNvfp4Linear -> Nvfp4Linear, etc.
- Moved kernel code to dsv4/kernels/ (gemm, attention, compressor, decode, cuda)
- Moved PyTorch bridges to dsv4/ops/
- Moved nn.Module layers to dsv4layers/
- Moved reference implementations to dsv4/reference/
- Moved vendored CUTLASS code to vendored/
- Archived ~190 debug tests to tests/archive/
- Kept ~15 canonical tests in tests/unit/
- Updated all import paths
- Added stubs for future components (model/, cache/, loader/)
- Updated pyproject.toml: dsv4-inference package name
2026-05-21 17:30:44 +00:00