nvfp4-megamoe-kernel/vllm at 835e1a0590c5852f26ef85ff5cccfcebf50e487d - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 8f80991fdf CRITICAL FIX: Properly dequantize fp8 KV in decode using per-token inv_scale

2026-05-19 17:08:58 +00:00

..

kernels/linear/nvfp4

Fix OOM: add --max-model-len=876544 + revert CPU dummy weight

2026-05-19 07:35:43 +00:00

CRITICAL FIX: Properly dequantize fp8 KV in decode using per-token inv_scale

2026-05-19 17:08:58 +00:00

cutedsl_quant_method.py

Fix OOM: add --max-model-len=876544 + revert CPU dummy weight

2026-05-19 07:35:43 +00:00

nvfp4_cutedsl.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00