biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 07:10:20 +00:00

34c43958d0 vllm tweaks

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 06:24:17 +00:00

48e4cb625d fix: default activation global_scale so runner works without finalize_weights

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 06:22:28 +00:00

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 06:06:15 +00:00

b497b35a10 fix: dynamic activation quantization (quantize_to_nvfp4) + per-expert scale assembly

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 05:55:47 +00:00

78bebff736 test: standalone CuTeDSL GEMM diagnostic

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 03:35:22 +00:00

d2965b432d fix: set _l1_activation_global_scale (with underscore) — attribute name mismatch

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 22:49:32 +00:00

b382a7a528 fix: handle input_scale as 1D or 2D (EP splits change the shape)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 22:23:33 +00:00

139c9c37cd fix: read input_scale from nn.Parameter before it's freed

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 21:46:01 +00:00

152648789d fix: use checkpoint input_scale for activation global scale (not hardcoded 1/2688)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 21:41:02 +00:00

af087e655e docs: update README — vLLM cudagraph inference running, output quality in progress

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:45:47 +00:00

0a5cfe0433 add kernel compile caching — compile once, invoke on subsequent calls

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:42:58 +00:00

3465b9d471 remove torch.cuda.synchronize() from run_nvfp4_grouped_gemm (cudagraph-safe)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:40:19 +00:00

5e245bc0c6 fix: missing newline

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:39:40 +00:00

288e179f88 add quantize_activation_nvfp4 (cudagraph-safe, fixed global scale)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:37:44 +00:00

521e11e468 test: old bridge + LUT quantization only (step 1 of cudagraph migration)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:34:47 +00:00

f51be76e8f temp: restore EXACT old bridge.py from b685112

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:28:18 +00:00

58dc36e21c fix: compile fresh each call — cached compile produces wrong TMA descriptors

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:25:19 +00:00

98cc6ac1f3 fix: invert cache check logic (compile when NOT in cache)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:24:05 +00:00

e337ec86a3 debug: test with cache enabled

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 20:22:57 +00:00

bc56452be8 debug: disable kernel cache to test fresh compilation