nvfp4-megamoe-kernel

Files

biondizzle f5ce728ef2 Fix OOM: add --max-model-len=876544 + revert CPU dummy weight

The CPU dummy weight broke torch.mm(compressor.weight.T) which expects
GPU tensors. Instead, reduce max_model_len to fit KV cache within
available memory (876544 instead of 1048576).

2026-05-19 07:35:43 +00:00

kernels/linear/nvfp4

Fix OOM: add --max-model-len=876544 + revert CPU dummy weight

2026-05-19 07:35:43 +00:00

patches

Revert deepseek_v4_attention.py to ffc2264 — don't nuke existing patches

2026-05-19 06:52:40 +00:00

cutedsl_quant_method.py

Fix OOM: add --max-model-len=876544 + revert CPU dummy weight

2026-05-19 07:35:43 +00:00

nvfp4_cutedsl.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00