nvfp4-megamoe-kernel

Files

biondizzle a569612df5 feat: add load progress heartbeats to prevent k8s health check kills

The 5-minute gap after safetensors load is GPU weight upload — no
output, k8s marks the pod unhealthy. Now prints a heartbeat every
256 weight loads during the expert loading phase.

Also adds checkpoint-ready and model-ready prints around finalize:
  Checkpoint loaded. Transferring weights to GPU & preparing NVFP4...
  (JIT compile)NVFP4 MoE layers: 50%|██████████░░░░░░░░░░| 31/61
  NVFP4 model ready ✓

2026-05-16 05:51:35 +00:00

patches

feat: add load progress heartbeats to prevent k8s health check kills

2026-05-16 05:51:35 +00:00

nvfp4_cutedsl.py

fix: L1 gate/up split — intermediate_size is per-projection, not fused

2026-05-16 04:04:40 +00:00