biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:45:24 +00:00
9264023e3b Update STAGE_D.md with D5b results: merge cos 0.961, LSE err=0.0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:43:05 +00:00
2ced9d0da7 D5b: Fix reference computation - use logsumexp for stable LSE, fix o_unnorm definition
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:36:28 +00:00
2883e042ca D5b MILESTONE: SWA+sink merge works! cos 0.969
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:35:41 +00:00
70763030c0 D5b: Use normalized O + LSE for merge (correct formula), always output LSE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:33:46 +00:00
28949da6e4 D5b: Clean up merge test - stable formula for both ref and kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:31:54 +00:00
9e1859827f D5b: Use reference per-row LSE for proper O normalization
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:31:06 +00:00
48d37d652e D5b: Fix kernel_obj reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:30:02 +00:00
caf89c65bf D5b: Fix syntax error
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:19:27 +00:00
3dd9cd6a94 D5b: Debug reference formula mismatch, add numerically stable merge
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:18:07 +00:00
98390df27e D5b: Python SWA+sink merge test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:16:54 +00:00
60e03fe84a Update STAGE_D.md: D5a done, CG-2/CG-3 status updated, tOrP0 offset rule added
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:15:15 +00:00
edc283e6c1 D5a: Fix LSE formula - lse = ln(row_sum) + row_max * ln(2)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:14:00 +00:00
6ca294ed6d D5a: Use tensor indexing for LSE write
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:13:09 +00:00
7e91d76669 D5a: Use cute.store for LSE write
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:12:32 +00:00
751abd9b18 D5a: Fix LSE - compute row_max_safe from final row_max, remove mLSE None check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:11:41 +00:00
d6ea7f3ebd D5a: Fix - add normalize param to __init__
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:10:41 +00:00
c80f223d08 D5a: Add normalize flag + LSE output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:08:00 +00:00
542bc7b1b0 D1.3: Use const_expr if for tOrP0 compile-time selection
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:07:11 +00:00
37edd783ce D1.3: Pre-compute tOrP0_offset in _setup, use const_expr for compile-time selection
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:06:18 +00:00
972fbd48b9 D1.3: Use const_expr for tOrP0 offset (compile-time conditional)