biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:31:56 +00:00
b8e2cf61ad Add debug logging to Blackwell attention path
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:19:29 +00:00
d7f686bcfc Fix wrapper attribute access: kv_cache, attn_sink, max_model_len via mla_attn
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:06:26 +00:00
114da83090 Add CSA/HCA decode + prefill attention to Blackwell path
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:04:54 +00:00
2cc1910c45 Fix N for C128A (need 128 tokens)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:04:39 +00:00
cea453cbab Fix compressor key name
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:04:23 +00:00
04f2b2d8d4 Add CSA sparse attention test (compressed KV gather + SWA merge)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:01:13 +00:00
4c6464e7e0 Update CURRENT_BUG: KV cache pipeline verified, all tests passing
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:00:36 +00:00
be8566a443 Add decode vs prefill consistency test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:55:42 +00:00
2ddd3d0702 Test with all 61 layers (shared experts only)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:54:43 +00:00
842e6e1381 Fix view→reshape for non-contiguous tensor
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:53:32 +00:00
f0f8d8211b Add e2e decode test (3 layers: C128A, C4A, SWA)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:48:17 +00:00
255913fba4 Vectorize paged KV cache read/write, kill container
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:34:11 +00:00
8b2cb41160 Fix KV cache: write to paged cache, handle uint8→fp8 conversion, fix RoPE bug
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:30:32 +00:00
6ceb05327f Add blackwell_attention module and comprehensive test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:28:55 +00:00
85c74e5932 Fix attention for decode (1 query vs N cached KVs)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:28:29 +00:00
85099c7e75 Fix fp8 amax in decode test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:27:58 +00:00
c66b0b88c0 Add decode attention pipeline test — reproduces KV cache bug
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 15:19:58 +00:00
836fa75b93 Update README and CURRENT_BUG: BUILD YOUR OWN KERNELS. Stop patching vLLM.
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 10:36:24 +00:00
dca8bfc3a8 Fix _apply_rope_kv: use inline RoPE instead of 3D apply_gptj_rope
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 10:31:08 +00:00
8e6721917e Fix syntax in RoPE KV test