biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:45:23 +00:00
d36dbba01c Fix B2 indexer: increase TMEM_COLS to 512 for full 128-row MMA output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:44:11 +00:00
797345dfe9 Add B2 score debug test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:39:53 +00:00
afb82b9c89 Fix B2 indexer: replace broken 16x256b TMEM read with proven 32x32b.x8
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:36:12 +00:00
99e50fcb58 Add B2 minimal debug test to find hang point
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:25:56 +00:00
e21bd14408 Fix B1 test LSE reference shape handling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:24:21 +00:00
4fe7f9dc37 Fix B1 FMHA: swap V matrix canonical layout args (dd, kk) not (kk, dd)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:23:47 +00:00
29a95a3db6 Add B1 QK vs PV isolation test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:22:03 +00:00
c322e3f301 Add B1 FMHA debug test for cosine failure investigation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:21:31 +00:00
5447d1d1dc Add comprehensive B2 FP8 indexer unit test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:20:12 +00:00
38eecb28d8 Add comprehensive B1 mixed FP8 FMHA unit test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:09:39 +00:00
f2063c0588 B1: minimal debug test for mixed FP8 FMHA (1 head, N=128)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:07:39 +00:00
0cea0b33ff B1 test: fix BF16 reference to use PyTorch SDPA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-03 00:06:28 +00:00
a51d19a7fc B1: add mixed FP8 FMHA cosine verification test (HD=512, N=128-2048)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 23:19:10 +00:00
b9243fe40a B2: FP8 tensor-core indexer scoring + weighted ReLU + top-k
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 22:53:18 +00:00
a9d5e09f4c B1: mixed FP8/BF16 decode FMHA integration
biondizzle pushed tag pre-b1 to biondizzle/nvfp4-megamoe-kernel 2026-06-02 22:48:59 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 22:31:17 +00:00
2eb4f0886e things
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 21:52:45 +00:00
9d4a014fad Fix NameError: dequantize_nvfp4 not in scope in forward_attention
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 21:39:04 +00:00
9ba6476d3f auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 21:35:03 +00:00
845227c06c Fix stale lock file in CUDA loader — prevents infinite spin on crash recovery