biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 07:57:43 +00:00
d8e17d70c1 P0+P1+P2: Enable fused SwiGLU (MoE+SE), fix SE _run_l1_fused, remove per-call gsa fill_
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 07:32:14 +00:00
61d5e7ba53 revert: P2 gsa fill elimination — revert to proven path for e2e stability
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 07:16:20 +00:00
790f8c350a perf: P2 landed (gsa fill elimination). P0/P1 fused SwiGLU disabled — CuTeDSL kernel arg-binding bug.
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 06:59:28 +00:00
040b2eb6e7 perf: P0/P1/P2 — fused SwiGLU for MoE+SE, eliminate per-call gsa fill
biondizzle pushed tag v-c1-c2-c3-20260602 to biondizzle/nvfp4-megamoe-kernel 2026-06-02 06:33:27 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 06:18:09 +00:00
e9506e0c20 perf: C1/C2/C3 — per-layer max_comp, pre-allocated gather_buf, SWA views
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 06:14:26 +00:00
617da29a5b fix: assert topk_idx is not None in CSA layers — no silent fallback to SWA-only
biondizzle pushed tag v-indexer-fix-20260602 to biondizzle/nvfp4-megamoe-kernel 2026-06-02 06:09:36 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:53:23 +00:00
5b4c496512 fix: three indexer bugs — weight path, comp_idx_buf width, scoring einsum
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:51:26 +00:00
0fbf28dd54 doc: INDEXER_PROBE_RESULTS_20260602 — compressed key width is ihd=128, not n_ih*ihd=8192
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:38:46 +00:00
8162c586c3 probe: fix comp_idx_buf width to ihd=128 so indexer probe can complete
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:25:46 +00:00
5be31d8582 fix: indexer compressor weight path — weights are at *.indexer.kv_proj not *.indexer.compressor.kv_proj
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:17:01 +00:00
fdfcca918c probe: verify indexer compressor load state
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 05:06:23 +00:00
fb0ed87626 probe: add indexer compressor early-return and buffering diagnostics
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 04:44:51 +00:00
06c92f208f INDEXER PROBE: instrumentation prints for compressed key width investigation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 04:38:25 +00:00
510eaf4a26 probe: HF indexer architecture from B200
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 04:36:38 +00:00
938e9079ce probe: indexer and compressor weight shapes from checkpoint
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 04:31:19 +00:00
9254cb0b0d test: NVFP4 runtime gsa accuracy vs PyTorch reference