7 Commits

Author SHA1 Message Date
f3b551956d Cleanup Step 2: Archive Lineage P code, fix broken imports
- Move dead dsv4/ modules to dsv4/_archive/ (52 files)
  - model/{dsv4,mtp,layer,layer_schedule}
  - layers/{embedding,attention,ffn,norm} (kept linear,mhc,router,moe,shared_expert,grouped_linear - live)
  - cache/*, kernels/cache/*, kernels/indexer/{csa_indexer,score_topk,compute_valid_lens}
  - kernels/router/{nvfp4_fused_router,dense_router_decode_kernel,dense_router_prefill}
  - ops/{topk,topk_select,rope,router}, loader/{hf_checkpoint,layout_convert}
  - reference/{attention,compressor,csa_attention,moe_pipeline}
  - kernels/compressor/{compress_tail,csa_hca}
- Restore dsv4/ops/{router,custom_ops}.py (needed by live layers)
- Fix dsv4/kernels/{indexer,compressor,attention}/__init__.py (removed broken imports)
- Remove preload_all() from loader.py (dead, referenced nonexistent .cu file)
- Fix loader.py docstring (fused_amax_quantize_nvfp4 → quantize_nvfp4_from_buffer)
- Move broken tests to tests/e2e_archive/
  - test_fused_router, production_values_test, e2e/{one_layer,model_construction,csa_hca}
- vLLM has 0 imports of dsv4 (Step 0 confirmed)
2026-06-02 19:27:07 +00:00
4472928506 E3: model construction test 2026-05-30 21:22:34 +00:00
c4b40dd06c E2: CSA/HCA integration test — gather + FMHA end-to-end
Tests:
- CSA: gather_compressed_kv (top-k) + gather_swa_kv + sparse FMHA
- HCA: gather_all_compressed_kv + gather_swa_kv + dense FMHA
- Verifies shapes, dtypes, and numerical sanity (no NaN/Inf)
2026-05-30 21:19:28 +00:00
924707a673 fix: add FFNType/RouterMode to LayerSpec in e2e test 2026-05-30 21:11:04 +00:00
e2e21c6350 fix: remove unused pytest import from e2e test 2026-05-30 21:10:43 +00:00
300dddedc0 E1-E4: gather kernels, handle wiring, rope, sync removal, e2e test
E1: LayerCacheHandle now exposes gather_compressed_kv,
    gather_all_compressed_kv, gather_swa_kv, num_query_heads, head_dim.
    Gather kernels in dsv4/kernels/cuda/gather_swa.cu + gather_kv.cu.
    Python wrapper in dsv4/kernels/cache/gather.py.

E2: tests/e2e/test_one_layer.py — SWA path smoke test.

E3: Compressor/indexer __init__.py bridges (NotImplementedError stubs
    for CSA/HCA compress_and_store, compute_index_scores_topk).

E4: Removed torch.cuda.synchronize() from fmha_multitile_op.py fast path.
    Error checking via C API return code instead.

Also: forward_rope_partial in ops/rope.py (GPT-J interleaved, last 64 dims).
2026-05-30 21:10:26 +00:00
3fb3c925af Restructure: cutedsl/ -> dsv4/ with proper layering
- Split bridge.py -> ops/quantize.py, ops/layouts.py, ops/gemm_runner.py
- Renamed classes: CuTeDSLNvfp4Linear -> Nvfp4Linear, etc.
- Moved kernel code to dsv4/kernels/ (gemm, attention, compressor, decode, cuda)
- Moved PyTorch bridges to dsv4/ops/
- Moved nn.Module layers to dsv4layers/
- Moved reference implementations to dsv4/reference/
- Moved vendored CUTLASS code to vendored/
- Archived ~190 debug tests to tests/archive/
- Kept ~15 canonical tests in tests/unit/
- Updated all import paths
- Added stubs for future components (model/, cache/, loader/)
- Updated pyproject.toml: dsv4-inference package name
2026-05-21 17:30:44 +00:00