biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 19:03:58 +00:00
3c1a76bdcc Fix Dockerfile: use external patch script instead of inline Python
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 18:35:36 +00:00
75844a8361 Post-quant fix via Dockerfile patch to process_weights_after_loading
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 18:15:38 +00:00
a4ad5898c1 Fix post-quant hook: register on inner model, fix module refs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 17:56:21 +00:00
a51edd238e Add post-quant-init forward hook to fix attention NVFP4
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 16:43:46 +00:00
2835cb040b Fix input_scale BEFORE process_weights_after_loading runs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 16:23:42 +00:00
2fc81ccac4 Revert to BF16 dequant for attention NVFP4 (input_scale fix was too early)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 16:00:01 +00:00
4a57399592 Add debug prints for input_global_scale_inv check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 15:43:48 +00:00
f86892e26b Replace BF16 dequant with input_scale warmup fix for attention NVFP4
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 15:22:54 +00:00
301015b037 Remove all inline diagnostics — incompatible with torch.compile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 15:05:54 +00:00
a83d364d45 Switch to cudagraph_mode=NONE (not enforce-eager) for real inference testing
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 14:45:44 +00:00
2a2a42c6d6 Add attention-internal diagnostics: MLA output, FP8 quant output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 14:24:15 +00:00
5c1dda10f6 Add granular attention diagnostics: pre/post attn, embed, dequant stats
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 14:04:13 +00:00
e0e0528778 Add debug logging for BF16 dequant to find missing attrs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 13:47:09 +00:00
2e8c3c961f Fix: dequantize fused_wqa_wkv instead of separate wq_a/wkv
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 13:22:18 +00:00
a7216b27df Fix: keep wo_a as FP8 (fp8_einsum path), dequant others to BF16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 13:09:38 +00:00
334e95047e Fix: dequantize ALL attention NVFP4 projections to BF16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 12:54:17 +00:00
a83c332059 Fix docker-compose: remove orphaned compilation-config arg, enforce-eager mode
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 12:51:51 +00:00
9e7639fba4 Add layer-by-layer diagnostic prints (CLAWMINE_DEBUG=1, enforce-eager)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 12:17:28 +00:00
2d1e9f42b1 Remove NaN check — incompatible with Dynamo fullgraph compilation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 11:33:31 +00:00
65763a200c Fix NaN check: wrap in @torch.compiler.disable to prevent Dynamo graph break