biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 10:28:17 +00:00
cbf440f75a Add RoPE KV test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 10:27:16 +00:00
a5fabbdf66 Apply RoPE to KV in Blackwell attention path - fix NaN output
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 10:04:46 +00:00
7e97551fd3 Fix: use self.scale instead of self.softmax_scale in Blackwell attention path
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:52:25 +00:00
39310c357d Patch compressor cache for Blackwell (no FlashMLA alignment) - fixes 91 missing layers
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:45:12 +00:00
d9cd8fa165 Add debug patch to print layer name mismatch
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:37:40 +00:00
9a0b015aac Reduce max_model_len to 256
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:30:03 +00:00
de1fb839f0 Patch SWA and Indexer cache specs for Blackwell (no FlashMLA alignment)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:23:19 +00:00
ea771ff70b Reduce max_model_len to 512 for initial container test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:13:34 +00:00
bcfbd1e25b Reduce max_model_len to 32768 (876544 requires 204 GiB KV cache)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:05:22 +00:00
e91421f06e Fix KV cache page size patch: separate groups for large SWA pages
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:04:20 +00:00
dd7f2627e8 Add full model forward test (WIP), sparse attention test passes
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 09:02:13 +00:00
9781953509 Add CSA/HCA sparse attention kernel test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:58:48 +00:00
d60673864a Fix kv_ref transpose in KV cache test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:57:32 +00:00
c1099d76d2 Add KV cache kernel test - fp8 quantize/dequant, paged cache, CSA/HCA compression
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:55:33 +00:00
c54ddbdae1 Fix NVFP4 attention: slice output to actual N after 128-padding
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:55:02 +00:00
42285b6c24 Add CuTeDSL NVFP4 attention kernel test - Q×K^T GEMM
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:51:17 +00:00
9465929e6e Add DeepSeek-V4 CSA/HCA attention pipeline test (not MLA)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:45:45 +00:00
fa71fbe909 Patch KV cache utils: handle DeepseekV4 SWA page sizes > MLA page sizes
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:38:56 +00:00
d08a457829 Fix cos_sin cache shape in NVFP4 attention test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:38:30 +00:00
7dd8871e84 Add NVFP4 attention test - quantize Q and K for Q×K^T GEMM