biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 19:03:58 +00:00

3c1a76bdcc Fix Dockerfile: use external patch script instead of inline Python

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 18:35:36 +00:00

75844a8361 Post-quant fix via Dockerfile patch to process_weights_after_loading

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 18:15:38 +00:00

a4ad5898c1 Fix post-quant hook: register on inner model, fix module refs

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 17:56:21 +00:00

a51edd238e Add post-quant-init forward hook to fix attention NVFP4

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 16:43:46 +00:00

2835cb040b Fix input_scale BEFORE process_weights_after_loading runs

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 16:23:42 +00:00

2fc81ccac4 Revert to BF16 dequant for attention NVFP4 (input_scale fix was too early)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 16:00:01 +00:00

4a57399592 Add debug prints for input_global_scale_inv check

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 15:43:48 +00:00

f86892e26b Replace BF16 dequant with input_scale warmup fix for attention NVFP4

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 15:22:54 +00:00

301015b037 Remove all inline diagnostics — incompatible with torch.compile

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 15:05:54 +00:00

a83d364d45 Switch to cudagraph_mode=NONE (not enforce-eager) for real inference testing

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 14:45:44 +00:00

2a2a42c6d6 Add attention-internal diagnostics: MLA output, FP8 quant output

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 14:24:15 +00:00

5c1dda10f6 Add granular attention diagnostics: pre/post attn, embed, dequant stats

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 14:04:13 +00:00

e0e0528778 Add debug logging for BF16 dequant to find missing attrs

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 13:47:09 +00:00

2e8c3c961f Fix: dequantize fused_wqa_wkv instead of separate wq_a/wkv

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 13:22:18 +00:00

a7216b27df Fix: keep wo_a as FP8 (fp8_einsum path), dequant others to BF16

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 13:09:38 +00:00

334e95047e Fix: dequantize ALL attention NVFP4 projections to BF16

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 12:54:17 +00:00

a83c332059 Fix docker-compose: remove orphaned compilation-config arg, enforce-eager mode

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 12:51:51 +00:00

9e7639fba4 Add layer-by-layer diagnostic prints (CLAWMINE_DEBUG=1, enforce-eager)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 12:17:28 +00:00

2d1e9f42b1 Remove NaN check — incompatible with Dynamo fullgraph compilation

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 11:33:31 +00:00

65763a200c Fix NaN check: wrap in @torch.compiler.disable to prevent Dynamo graph break