Files
nvfp4-megamoe-kernel/vllm
biondizzle f5ce728ef2 Fix OOM: add --max-model-len=876544 + revert CPU dummy weight
The CPU dummy weight broke torch.mm(compressor.weight.T) which expects
GPU tensors. Instead, reduce max_model_len to fit KV cache within
available memory (876544 instead of 1048576).
2026-05-19 07:35:43 +00:00
..