|
|
4472928506
|
E3: model construction test
|
2026-05-30 21:22:34 +00:00 |
|
|
|
c4b40dd06c
|
E2: CSA/HCA integration test — gather + FMHA end-to-end
Tests:
- CSA: gather_compressed_kv (top-k) + gather_swa_kv + sparse FMHA
- HCA: gather_all_compressed_kv + gather_swa_kv + dense FMHA
- Verifies shapes, dtypes, and numerical sanity (no NaN/Inf)
|
2026-05-30 21:19:28 +00:00 |
|
|
|
924707a673
|
fix: add FFNType/RouterMode to LayerSpec in e2e test
|
2026-05-30 21:11:04 +00:00 |
|
|
|
e2e21c6350
|
fix: remove unused pytest import from e2e test
|
2026-05-30 21:10:43 +00:00 |
|
|
|
300dddedc0
|
E1-E4: gather kernels, handle wiring, rope, sync removal, e2e test
E1: LayerCacheHandle now exposes gather_compressed_kv,
gather_all_compressed_kv, gather_swa_kv, num_query_heads, head_dim.
Gather kernels in dsv4/kernels/cuda/gather_swa.cu + gather_kv.cu.
Python wrapper in dsv4/kernels/cache/gather.py.
E2: tests/e2e/test_one_layer.py — SWA path smoke test.
E3: Compressor/indexer __init__.py bridges (NotImplementedError stubs
for CSA/HCA compress_and_store, compute_index_scores_topk).
E4: Removed torch.cuda.synchronize() from fmha_multitile_op.py fast path.
Error checking via C API return code instead.
Also: forward_rope_partial in ops/rope.py (GPT-J interleaved, last 64 dims).
|
2026-05-30 21:10:26 +00:00 |
|
|
|
3fb3c925af
|
Restructure: cutedsl/ -> dsv4/ with proper layering
- Split bridge.py -> ops/quantize.py, ops/layouts.py, ops/gemm_runner.py
- Renamed classes: CuTeDSLNvfp4Linear -> Nvfp4Linear, etc.
- Moved kernel code to dsv4/kernels/ (gemm, attention, compressor, decode, cuda)
- Moved PyTorch bridges to dsv4/ops/
- Moved nn.Module layers to dsv4layers/
- Moved reference implementations to dsv4/reference/
- Moved vendored CUTLASS code to vendored/
- Archived ~190 debug tests to tests/archive/
- Kept ~15 canonical tests in tests/unit/
- Updated all import paths
- Added stubs for future components (model/, cache/, loader/)
- Updated pyproject.toml: dsv4-inference package name
|
2026-05-21 17:30:44 +00:00 |
|