From 0fbf28dd541b3df9270907740c577ec8ecd50311 Mon Sep 17 00:00:00 2001 From: biondizzle Date: Tue, 2 Jun 2026 05:51:24 +0000 Subject: [PATCH] =?UTF-8?q?doc:=20INDEXER=5FPROBE=5FRESULTS=5F20260602=20?= =?UTF-8?q?=E2=80=94=20compressed=20key=20width=20is=20ihd=3D128,=20not=20?= =?UTF-8?q?n=5Fih*ihd=3D8192?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../INDEXER_PROBE_RESULTS_20260602.md | 126 ++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 archived_plans/INDEXER_PROBE_RESULTS_20260602.md diff --git a/archived_plans/INDEXER_PROBE_RESULTS_20260602.md b/archived_plans/INDEXER_PROBE_RESULTS_20260602.md new file mode 100644 index 00000000..d47c79d6 --- /dev/null +++ b/archived_plans/INDEXER_PROBE_RESULTS_20260602.md @@ -0,0 +1,126 @@ +# Indexer probe results — 2026-06-02 + +## Raw output + +### Indexer load state (after fix for weight path bug) + +``` +Indexer L2: q_b_lin=True wp_lin=True compressor=True +Indexer L4: q_b_lin=True wp_lin=True compressor=True +Indexer L6: q_b_lin=True wp_lin=True compressor=True +``` + +Note: `compressor=False` before the weight path fix. The original code looked for +`*.indexer.compressor.kv_proj.weight` but the checkpoint keys are `*.indexer.kv_proj.weight` +(no extra `.compressor` nesting). Fix: changed `Indexer.load` to look for +`f"{pfx}.kv_proj.weight"` instead of `f"{pfx}.compressor.kv_proj.weight"`. + +### Compressor output shapes (at first block boundary, token 3 of prefill) + +``` +COMPRESSOR OUT [hd=512 kv_dim=1024 ratio=4 is_csa=True]: compressed.shape=(1, 512) dtype=torch.bfloat16 stride=(512, 1) contig=True +COMPRESSOR OUT [hd=128 kv_dim=256 ratio=4 is_csa=True]: compressed.shape=(1, 128) dtype=torch.bfloat16 stride=(128, 1) contig=True +``` + +The first line is the **main CSA compressor** (compresses KV for attention). +The second line is the **indexer's internal compressor** (compresses hidden states for indexer scoring). + +### Reshape failure (at Indexer.forward, L2, token 3) + +``` +!!! RESHAPE FAILURE L2 !!! +comp_indexer_kv.shape = (1, 128) +tried to reshape to (1, 64, 128) +total elements: have 128, need 8192 + k_idx = comp_indexer_kv.reshape(n_comp, self.n_ih, self.ihd) +RuntimeError: shape '[1, 64, 128]' is invalid for input of size 128 +``` + +### Checkpoint weight shapes (from safetensors scan of L2 indexer) + +``` +model.layers.2.self_attn.compressor.indexer.q_b_proj.weight: shape=(8192, 768) dtype=uint8 +model.layers.2.self_attn.compressor.indexer.weights_proj.weight: shape=(64, 3584) dtype=uint8 +model.layers.2.self_attn.compressor.indexer.kv_proj.weight: shape=(256, 3584) dtype=uint8 +model.layers.2.self_attn.compressor.indexer.gate_proj.weight: shape=(256, 3584) dtype=uint8 +model.layers.2.self_attn.compressor.indexer.position_bias: shape=(4, 256) dtype=bfloat16 +model.layers.2.self_attn.compressor.indexer.kv_norm.weight: shape=(128,) dtype=bfloat16 +``` + +### KVCache comp_idx_buf crash (before width fix) + +``` +RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 1. Target sizes: [1, 512]. Tensor sizes: [128] + at: self.comp_idx_buf[self.n_comp:end] = idx_kv +``` + +Original `comp_idx_buf` was `(max_comp, head_dim=512)` but indexer compressed keys are width 128. + +--- + +## Answers + +### Q1: shape of indexer.compressor.forward(...)[0] + +Observed: `(1, 128)` — width **W = 128 = ihd** (the indexer head dim) +Hypothesis matched: **A** (paper-aligned: `c_I = 128`) + +The indexer compressor outputs one compressed block of width `ihd=128` per `m=4` tokens. +This is NOT `n_ih × ihd = 8192` (hypothesis B) and NOT `512` (hypothesis C / current buffer width). + +### Q2: indexer.compressor.kv_dim + +Observed: **256** (= `2 × ihd = 2 × 128`) +Expected per hypothesis A: 256 ✓ + +This is the internal projection width *before* the softmax/reduce. The compressor's +two GEMMs (`kv_proj` and `gate_proj`) each produce `(T, 256)`, then the CUDA reduce +kernel collapses every `m=4` tokens into one `(1, 128)` output. + +### Q3: q_b_lin and wp_lin shapes + +From checkpoint (NVFP4 packed: weight shape = (N_packed, K_packed)): +- **q_b_lin**: in_features = 768×2 = 1536 (q_a lora dim), out_features = 8192 (= n_ih × ihd = 64 × 128) ✓ +- **wp_lin**: in_features = 3584×2 = 7168 (hidden size), out_features = 64 (= n_ih) ✓ + +### Q4: Runtime k_idx shape and reshape validity + +- `comp_indexer_kv.shape` before reshape: **(1, 128)** +- Reshape target `(n_comp, 64, 128)`: **FAILED** +- Total elements: **have=128, need=8192** — off by **64×** (exactly `n_ih=64`) + +The current `Indexer.forward` tries `comp_indexer_kv.reshape(n_comp, self.n_ih, self.ihd)`, +which assumes the stored indexer keys have `n_ih × ihd = 8192` elements per block. +But the actual stored width is `ihd = 128` (one vector per compressed block, NOT +per-indexer-head). The 64× gap is exactly `n_ih = 64`. + +This means the scoring einsum `torch.einsum('tnd,cnd->tnc', q_idx, k_idx)` cannot +work as written. The indexer query `q_idx` is `(T, 64, 128)` (per-indexer-head), +but the stored key is `(n_comp, 128)` (a single vector). The correct scoring +formula must be different from what the current code assumes. + +--- + +## Conclusion + +The implementation stores indexer compressed keys at width **`ihd = 128`** (one +vector per compressed block, matching the paper's `c_I`). The current code incorrectly +assumes the stored keys have width `n_ih × ihd = 8192` (per-indexer-head multi-head +keys), causing a 64× reshape failure at the scoring step. The `comp_idx_buf` in `KVCache` +is also 4× too wide (512 vs 128). The indexer's scoring einsum and key storage both +need rearchitecting to match the paper's single-vector-per-block compressed key format. + +--- + +## Additional findings (not in original scope) + +1. **Weight path bug**: `Indexer.load` looked for `*.indexer.compressor.kv_proj.weight` + but the checkpoint has `*.indexer.kv_proj.weight` (no `.compressor` nesting). + Fixed in commit 5be31d8. + +2. **comp_idx_buf width**: was `head_dim=512`, should be `ihd=128`. Temporarily fixed + for probe in commit 8162c58. Proper fix depends on audit rewrite. + +3. **Indexer compressor never loaded before**: the weight path bug meant `indexer.compressor` + was always `None`, so the indexer was always skipped (`comp_idx_kv=None` on every + CSA layer). This means the indexer has NEVER been exercised in production runs.