biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:45:23 +00:00

d36dbba01c Fix B2 indexer: increase TMEM_COLS to 512 for full 128-row MMA output

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:44:11 +00:00

797345dfe9 Add B2 score debug test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:39:53 +00:00

afb82b9c89 Fix B2 indexer: replace broken 16x256b TMEM read with proven 32x32b.x8

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:36:12 +00:00

99e50fcb58 Add B2 minimal debug test to find hang point

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:25:56 +00:00

e21bd14408 Fix B1 test LSE reference shape handling

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:24:21 +00:00

4fe7f9dc37 Fix B1 FMHA: swap V matrix canonical layout args (dd, kk) not (kk, dd)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:23:47 +00:00

29a95a3db6 Add B1 QK vs PV isolation test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:22:03 +00:00

c322e3f301 Add B1 FMHA debug test for cosine failure investigation

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:21:31 +00:00

5447d1d1dc Add comprehensive B2 FP8 indexer unit test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:20:12 +00:00

38eecb28d8 Add comprehensive B1 mixed FP8 FMHA unit test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:09:39 +00:00

f2063c0588 B1: minimal debug test for mixed FP8 FMHA (1 head, N=128)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:07:39 +00:00

0cea0b33ff B1 test: fix BF16 reference to use PyTorch SDPA

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-03 00:06:28 +00:00

a51d19a7fc B1: add mixed FP8 FMHA cosine verification test (HD=512, N=128-2048)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 23:19:10 +00:00

b9243fe40a B2: FP8 tensor-core indexer scoring + weighted ReLU + top-k

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 22:53:18 +00:00

a9d5e09f4c B1: mixed FP8/BF16 decode FMHA integration

biondizzle pushed tag pre-b1 to biondizzle/nvfp4-megamoe-kernel

2026-06-02 22:48:59 +00:00

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 22:31:17 +00:00

2eb4f0886e things

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 21:52:45 +00:00

9d4a014fad Fix NameError: dequantize_nvfp4 not in scope in forward_attention

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 21:39:04 +00:00

9ba6476d3f auto: pre-test commit

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-02 21:35:03 +00:00

845227c06c Fix stale lock file in CUDA loader — prevents infinite spin on crash recovery