Files
nvfp4-megamoe-kernel/vllm
biondizzle 4d4cfa6b28 fix: tqdm over MoE layer warmup, compile every layer, no print spam
The outer loop tqdm now covers the full finalize_weights + warmup for
each MoE layer. CuTeDSL caches by (M,N,K) so every layer shape gets
compiled during warmup — no RPC timeouts during inference.

  (JIT compile)NVFP4 MoE layers:  50%|██████████░░░░░░░░░░| 31/61
2026-05-16 05:21:11 +00:00
..