biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:54:31 +00:00
a53936a17c diag: print l1_out shape warning in shared expert
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:50:56 +00:00
db30c4acd6 auto: pre-test push for test_se_gpu.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:43:11 +00:00
3dd95ce77b fix: set activation global scales AFTER _ensure_stacked/_ensure_initialized (which override them)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:31:37 +00:00
27c63b01d6 diag: remove broken SE reference comparison, add gsa/gsb print
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:25:31 +00:00
9a27ed21fd diag: compare shared expert output with PyTorch reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:16:27 +00:00
ee8318ad58 diag: handle NaN in shared expert output print
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:09:14 +00:00
7000762309 diag: fix SE weight attribute name
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 03:03:07 +00:00
fba1c06cad diag: check SE weight integrity
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:56:31 +00:00
22d7cc9b7a diag: cuda sync check after shared expert for first 3 layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:49:58 +00:00
b85fcf4d6f diag: print SE global scales for first 3 layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:41:14 +00:00
48d93a6d2e diag: MoE input/output diagnostics for first 3 layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:34:29 +00:00
856a459a98 fix: init l1_gsa_list and l2_gsa_list
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:31:14 +00:00
66b98e5794 fix: MoE and shared expert global scale — gsb=ws2, gsa=input_scale (same bug as Nvfp4Linear)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:19:36 +00:00
f4b444b456 fix: NVFP4 global scale bug — gsb=weight_scale_2 (not input_scale*ws2), gsa=input_scale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:12:41 +00:00
1eed28dd09 diag: compare production FMHA and NVFP4 linear output with PyTorch reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 02:02:16 +00:00
df394f8b40 fix: missing closing quote on string literal
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 01:58:46 +00:00
cfd2468c61 fix: decode loop also needs int32 token_ids for hash router
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 01:49:41 +00:00
905623793b fix: move token_ids to same GPU as router (was cuda:0 but router on cuda:N)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 01:41:05 +00:00
7804b779ce diag: print wo_a g_flat magnitude to find where zeros come from
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 01:34:04 +00:00
efe63caea9 diag: print FMHA output magnitude for first 3 layers