biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:10:20 +00:00
34c43958d0 vllm tweaks
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 06:24:17 +00:00
48e4cb625d fix: default activation global_scale so runner works without finalize_weights
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 06:22:28 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 06:06:15 +00:00
b497b35a10 fix: dynamic activation quantization (quantize_to_nvfp4) + per-expert scale assembly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 05:55:47 +00:00
78bebff736 test: standalone CuTeDSL GEMM diagnostic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 03:35:22 +00:00
d2965b432d fix: set _l1_activation_global_scale (with underscore) — attribute name mismatch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 22:49:32 +00:00
b382a7a528 fix: handle input_scale as 1D or 2D (EP splits change the shape)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 22:23:33 +00:00
139c9c37cd fix: read input_scale from nn.Parameter before it's freed
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 21:46:01 +00:00
152648789d fix: use checkpoint input_scale for activation global scale (not hardcoded 1/2688)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 21:41:02 +00:00
af087e655e docs: update README — vLLM cudagraph inference running, output quality in progress
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:45:47 +00:00
0a5cfe0433 add kernel compile caching — compile once, invoke on subsequent calls
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:42:58 +00:00
3465b9d471 remove torch.cuda.synchronize() from run_nvfp4_grouped_gemm (cudagraph-safe)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:40:19 +00:00
5e245bc0c6 fix: missing newline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:39:40 +00:00
288e179f88 add quantize_activation_nvfp4 (cudagraph-safe, fixed global scale)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:37:44 +00:00
521e11e468 test: old bridge + LUT quantization only (step 1 of cudagraph migration)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:34:47 +00:00
f51be76e8f temp: restore EXACT old bridge.py from b685112
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:28:18 +00:00
58dc36e21c fix: compile fresh each call — cached compile produces wrong TMA descriptors
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:25:19 +00:00
98cc6ac1f3 fix: invert cache check logic (compile when NOT in cache)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:24:05 +00:00
e337ec86a3 debug: test with cache enabled
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 20:22:57 +00:00
bc56452be8 debug: disable kernel cache to test fresh compilation