biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:54:31 +00:00

a53936a17c diag: print l1_out shape warning in shared expert

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:50:56 +00:00

db30c4acd6 auto: pre-test push for test_se_gpu.py

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:43:11 +00:00

3dd95ce77b fix: set activation global scales AFTER _ensure_stacked/_ensure_initialized (which override them)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:31:37 +00:00

27c63b01d6 diag: remove broken SE reference comparison, add gsa/gsb print

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:25:31 +00:00

9a27ed21fd diag: compare shared expert output with PyTorch reference

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:16:27 +00:00

ee8318ad58 diag: handle NaN in shared expert output print

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:09:14 +00:00

7000762309 diag: fix SE weight attribute name

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 03:03:07 +00:00

fba1c06cad diag: check SE weight integrity

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:56:31 +00:00

22d7cc9b7a diag: cuda sync check after shared expert for first 3 layers

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:49:58 +00:00

b85fcf4d6f diag: print SE global scales for first 3 layers

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:41:14 +00:00

48d93a6d2e diag: MoE input/output diagnostics for first 3 layers

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:34:29 +00:00

856a459a98 fix: init l1_gsa_list and l2_gsa_list

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:31:14 +00:00

66b98e5794 fix: MoE and shared expert global scale — gsb=ws2, gsa=input_scale (same bug as Nvfp4Linear)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:19:36 +00:00

f4b444b456 fix: NVFP4 global scale bug — gsb=weight_scale_2 (not input_scale*ws2), gsa=input_scale

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:12:41 +00:00

1eed28dd09 diag: compare production FMHA and NVFP4 linear output with PyTorch reference

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 02:02:16 +00:00

df394f8b40 fix: missing closing quote on string literal

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 01:58:46 +00:00

cfd2468c61 fix: decode loop also needs int32 token_ids for hash router

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 01:49:41 +00:00

905623793b fix: move token_ids to same GPU as router (was cuda:0 but router on cuda:N)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 01:41:05 +00:00

7804b779ce diag: print wo_a g_flat magnitude to find where zeros come from

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 01:34:04 +00:00

efe63caea9 diag: print FMHA output magnitude for first 3 layers