biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 09:02:54 +00:00
851ec9b4d5 P3 WIP: fused RMSNorm + quantize kernel skeleton (not yet integrated)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:43:42 +00:00
b13c1057f5 test: verify GEMM shape with production weight format
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:41:23 +00:00
40fb49d670 test: verify GEMM output shape
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:41:08 +00:00
f01d3f3eac wip: SE fused SwiGLU deinterleave fix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:29:06 +00:00
1726cb64a9 fix: interleave_l1_weights granularity_bf16 (not granularity) in SE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:25:55 +00:00
553275d810 feat: P1 — add eager warmup_fused_swiglu_compilation for SharedExpert (1-group)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:24:38 +00:00
5ed4c86137 fix: expert_offsets for 4-expert fused SwiGLU test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:23:31 +00:00
53362d2579 test: isolate fused SwiGLU — test no-clamp first
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:22:31 +00:00
ae4506d722 fix: w_gs is scalar not iterable
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:21:34 +00:00
b0c71b947e test: fused SwiGLU — smoke test + correctness comparison with graceful degradation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:20:29 +00:00
2cfca36095 fix: compute correct gs from data in fused SwiGLU test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:19:41 +00:00
4a05a40cf0 fix: fused SwiGLU test — proper weight quant + 128-token alignment
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:18:28 +00:00
fa769b6214 fix: pad activation as uint8 view for float4 dtype
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:17:37 +00:00
024be1a60b fix: test weight quantization dtype for fused SwiGLU test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:16:42 +00:00
19afa52e80 fix: use cute.where() directly for clamp in fused SwiGLU
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:16:04 +00:00
5c746bbdf2 fix: TensorSSA-compatible clamp in fused SwiGLU kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:12:56 +00:00
3a30f35c68 fix: cute.math.fmin/fmax → cute.arch.fmin/fmax in fused SwiGLU kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:11:23 +00:00
fca72427ea fix: add fp4_out/sf_out/l2_global_scale params to fused_swiglu kernel() signature
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:10:01 +00:00
55ea109cca test: fused SwiGLU kernel compilation + correctness (P0/P1 gate)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-02 08:00:07 +00:00
7904cf05c4 Add set_fused_swiglu() method to Nvfp4MoE