biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:19:23 +00:00
647c03b2ee fix: make_b_k_major must preserve shape — use double-permute trick
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:18:20 +00:00
ed4f501bba fix: make_b_k_major stride check — K-major means stride[1]==1, not stride[2]==1
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:16:31 +00:00
2162cee4ad fix: restore proper quantize_weight_to_nvfp4 — K is the packed dim, not N
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:09:03 +00:00
10f1dca982 fix: import ceil_div from correct module
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:08:24 +00:00
81632e2f21 fix: correct cutlass_torch import (cutlass.torch, not top-level)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:06:39 +00:00
16c4fad025 fix: remove cutlass.cute.backend import
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:06:00 +00:00
44b40d41fe fix: compile CuTeDSL kernel with real tensors, not dummy shapes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:00:37 +00:00
79281b6fda fix: compute K_packed/N_packed before passing to _get_compiled_kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:59:45 +00:00
caf93d6c45 fix: pass K_packed/N_packed to _get_compiled_kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:58:14 +00:00
ecc7b83334 fix: compile CuTeDSL kernel with actual tensor shapes, not dummy 256x256
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:55:21 +00:00
cc75a55bd9 restore: new bridge/moe_pipeline/layertest
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:54:06 +00:00
0c878b3a9e temp: restore old layertest+bridge for cosine comparison
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:38:34 +00:00
0069769d12 debug: print global scales
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:31:56 +00:00
84589fe984 debug: more prints
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:29:43 +00:00
fa2d5708c5 debug: add L1 GEMM and SiLU output debug prints
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:28:24 +00:00
4c06c51ec3 fix: moe_pipeline.py gate/up split — L1 output is 2*intermediate, not intermediate
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:23:45 +00:00
da31ce7e1a allow for cuda graphs again
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:07:37 +00:00
d15c43294b fix: test L2 weight N dim should be hidden_size, not hidden_size//2
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 19:07:10 +00:00
28788c6f55 fix: L1 weight N dimension is 2*intermediate (gate+up), not intermediate
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 18:55:49 +00:00
f7e29fdf1e docs: update README with cudagraph compatibility work and decisions