biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:45:24 +00:00
eebf33b97d test: clean minimal nvvm.inline_ptx test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:44:37 +00:00
882d48588b test: debug nvvm.inline_ptx with CUTLASS_LOG_LEVEL=DEBUG
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:43:27 +00:00
3ffb3b807a test: minimal nvvm.inline_ptx isolation test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:42:34 +00:00
e33c48e44c NVFP4-1.1: Use nvvm.inline_ptx instead of llvm.inline_asm for f32→i32
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:40:21 +00:00
74dba6ab9d auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:40:19 +00:00
1cbb3cf752 NVFP4-1.1: Replace threshold rounding with inline PTX cvt.rni/rz/rmi
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:35:08 +00:00
2777ebfe8e NVFP4-1.1: ultra-minimal test — Float32 comparison + Int32 select
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:33:41 +00:00
2087eaef49 NVFP4-1.1: minimal threshold rounding test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:32:10 +00:00
1828a71cde NVFP4-1.1: test kernel uses Float32 input (avoids BF16 scalar load issue)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:30:09 +00:00
d2aa93aad7 NVFP4-1.1: fix Int32 clamping — use comparisons instead of fmin/fmax (float-only ops)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:27:31 +00:00
accc66741d NVFP4-1.1: update test kernel with threshold rounding API
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:26:42 +00:00
dabcc415a8 NVFP4-1.1: threshold rounding for float-to-int — avoids CuTeDSL limitation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:09:59 +00:00
acf46c494c NVFP4-1.1: update approach doc and fp4_quant with CuTeDSL API fixes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:06:30 +00:00
f3a2b37d70 NVFP4-1.1: document CuTeDSL float-to-int limitation, revise approach to compact SwiGLU output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:02:46 +00:00
c3d5a7b82f NVFP4-1.1: try .to(Int32) for float-to-int conversion
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 04:01:39 +00:00
dc35d29811 NVFP4-1.1: fix cute.arch.store signature - store(ptr, val) not store(ptr, val, dtype)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 03:59:07 +00:00
a05a76bb6b NVFP4-1.1: add Int32 cast diagnostic test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 03:55:55 +00:00
e565ebce91 NVFP4-1.1: replace cute.math.fmin with cute.arch.fmin (correct API)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 03:52:47 +00:00
20d5ddfa3d NVFP4-1.1: fix indentation for @cute.jit decorators
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 03:50:12 +00:00
f6f59d34cb NVFP4-1.1: add @cute.jit decorator to fp4_quant functions for CuTeDSL if-block support