biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:37:01 +00:00
e0aa7ccd19 auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:37:01 +00:00
4f8559ae2e SMEM-P: implement full 128-value write in softmax loop using coordinate mapping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:33:37 +00:00
63f68eda52 SMEM-P: fix BF16 value creation (use constant)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:32:14 +00:00
aa82a0faf5 SMEM-P: implement CUTLASS LLM coordinate mapping pattern (minimal test)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:30:10 +00:00
c9b44e6bf9 SMEM-P: fix thread_idx tuple access
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:29:31 +00:00
97e97b63ea auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:29:28 +00:00
dee046287e SMEM-P: add debug to understand thread partitioning
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:22:31 +00:00
5b6a4fbef9 Update STAGE_D.md: manual SMEM addressing blocked on layout mapping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:21:31 +00:00
060cea5d0f SMEM-P: implement simple test pattern instead of coord lookup
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:20:44 +00:00
56bed1066d auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:20:41 +00:00
6c08a95620 Start implementing manual SMEM-P addressing (helpers are a trap)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:14:46 +00:00
7bf69a0265 Implement manual SMEM-P copy instead of cute.copy (helpers are a trap)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:14:08 +00:00
944fa9b155 auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:14:08 +00:00
e765685951 Try flattening sP and rP_bf16_qk with group_modes to fix rank mismatch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:13:09 +00:00
5ee0c20736 Add debug prints for SMEM-P partition layouts
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:12:14 +00:00
55dcee2d29 Fix SMEM-P: use BF16 copy atom and BF16 source with QK C-fragment layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:11:10 +00:00
77e01acd13 Fix SMEM-P copy: use tcgen05.copy.St32x32bOp with Float32 and copy from rP_words (Float32) not rP_bf16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:09:57 +00:00
01fd6d03db Update STAGE_D.md with current action plan - starting NVFP4-0 verification and D1.3 validation on B200
biondizzle pushed tag d1.3-pre-sm100-helpers to biondizzle/nvfp4-megamoe-kernel 2026-05-23 19:00:18 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 18:37:54 +00:00
5756b6e4ec 📋 Update STAGE_D.md: D1.3 SOLVED, D1.4 IMPLEMENTED, D1.5 🟡 complex refactor, checklist updated