biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:43:48 +00:00
5dcfb333ea Fix: move weight tensors to CUDA before dequant
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:43:03 +00:00
47c7b3c50b Fix: ensure FP4 LUT on CUDA before index op
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:42:20 +00:00
13bae9dd55 Fix single_shot: mHC replaces layernorm, no hidden-level norm in DSV4
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:41:09 +00:00
e8334fc4af Rewrite single_shot_inference.py — complete forward pass
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 22:39:03 +00:00
9b0858aa35 Add single_shot_inference.py — baseline kernel verification
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:22:36 +00:00
4472928506 E3: model construction test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:21:49 +00:00
afc07a5d1a Update STATUS.md: E5 done
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:21:04 +00:00
df6220abaf E5: Fold batch loop into native kernel grid (blockIdx.z)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:20:11 +00:00
e162a2d112 Update STATUS.md: E1-E4 done
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:19:30 +00:00
c4b40dd06c E2: CSA/HCA integration test — gather + FMHA end-to-end
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:19:06 +00:00
9d88769f5f Wire indexer compute_index_scores_topk + fix compressor imports
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:16:59 +00:00
daf84524ac E2/E3: compressor bridge, indexer bridge, flush pipeline wiring
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:15:58 +00:00
d3b772196d E3: Implement DSV4Model — full model class
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:14:17 +00:00
b0cdd5af74 fix: extern declarations for gather_swa functions in gather_kv.cu
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:13:28 +00:00
016d722abc fix: single PYBIND11_MODULE for combined gather .so
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:12:17 +00:00
8fb9d89658 fix: correct gather.py kernel_dir path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:11:07 +00:00
924707a673 fix: add FFNType/RouterMode to LayerSpec in e2e test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:10:46 +00:00
e2e21c6350 fix: remove unused pytest import from e2e test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:10:27 +00:00
300dddedc0 E1-E4: gather kernels, handle wiring, rope, sync removal, e2e test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-30 21:09:24 +00:00
faf92b30ad E1: Wire LayerCacheHandle gather methods + CUDA gather kernels