biondizzle

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 10:28:17 +00:00

cbf440f75a Add RoPE KV test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 10:27:16 +00:00

a5fabbdf66 Apply RoPE to KV in Blackwell attention path - fix NaN output

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 10:04:46 +00:00

7e97551fd3 Fix: use self.scale instead of self.softmax_scale in Blackwell attention path

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:52:25 +00:00

39310c357d Patch compressor cache for Blackwell (no FlashMLA alignment) - fixes 91 missing layers

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:45:12 +00:00

d9cd8fa165 Add debug patch to print layer name mismatch

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:37:40 +00:00

9a0b015aac Reduce max_model_len to 256

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:30:03 +00:00

de1fb839f0 Patch SWA and Indexer cache specs for Blackwell (no FlashMLA alignment)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:23:19 +00:00

ea771ff70b Reduce max_model_len to 512 for initial container test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:13:34 +00:00

bcfbd1e25b Reduce max_model_len to 32768 (876544 requires 204 GiB KV cache)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:05:22 +00:00

e91421f06e Fix KV cache page size patch: separate groups for large SWA pages

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:04:20 +00:00

dd7f2627e8 Add full model forward test (WIP), sparse attention test passes

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 09:02:13 +00:00

9781953509 Add CSA/HCA sparse attention kernel test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:58:48 +00:00

d60673864a Fix kv_ref transpose in KV cache test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:57:32 +00:00

c1099d76d2 Add KV cache kernel test - fp8 quantize/dequant, paged cache, CSA/HCA compression

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:55:33 +00:00

c54ddbdae1 Fix NVFP4 attention: slice output to actual N after 128-padding

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:55:02 +00:00

42285b6c24 Add CuTeDSL NVFP4 attention kernel test - Q×K^T GEMM

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:51:17 +00:00

9465929e6e Add DeepSeek-V4 CSA/HCA attention pipeline test (not MLA)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:45:45 +00:00

fa71fbe909 Patch KV cache utils: handle DeepseekV4 SWA page sizes > MLA page sizes

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:38:56 +00:00

d08a457829 Fix cos_sin cache shape in NVFP4 attention test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:38:30 +00:00

7dd8871e84 Add NVFP4 attention test - quantize Q and K for Q×K^T GEMM