biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:40:54 +00:00
311fae490f tune: reduce verbose diagnostics, print every decode step
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:24:20 +00:00
df8acae66b fix: rewrite compressor_reduce.cu — no extern shared mem, proper bounds checks
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:20:47 +00:00
62041b78bf fix: import torch.utils.cpp_extension explicitly in production_compress
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:19:36 +00:00
b380028c49 feat: production compressor/indexer — NVFP4 GEMM + CUDA softmax/reduce kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:19:36 +00:00
2155fd6c90 test: production compressor kernel unit test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:59:09 +00:00
6e53e3007c fix: clamp block_amax to E4M3 max (448) in quantize_activation_nvfp4 — prevents NaN from overflow
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:48:33 +00:00
eb9c46f8cb test: quantize on different GPUs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:43:50 +00:00
9ce7304783 test: direct SE L1 test on different GPUs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:40:09 +00:00
ce608d0e50 test: fix gemm 1-group test params
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:35:56 +00:00
c652177970 test: fix gemm 1-group test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:32:30 +00:00
793f062bbc auto: pre-test push for test_gemm_1group.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:30:43 +00:00
86cb0e64a6 auto: pre-test push for test_se_dequant.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:26:06 +00:00
9ba051cf49 test: fix gsa in SE multi-GPU test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:22:41 +00:00
419112dd3e auto: pre-test push for test_se_multi_gpu.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:14:49 +00:00
2cbc7459b0 diag: fix SE scale print (cast to float first)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:08:24 +00:00
bcd7a0cf0d diag: check SE weight and scale integrity for first 3 layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 04:01:47 +00:00
8ad617e2ff diag: NaN detection in shared expert gate/up split