nvfp4-megamoe-kernel

Files

biondizzle 014d647ba3 fix: sink bias domain correction — add attn_sink/scale to raw logits

The softmax scales by scale_log2 = scale * log2(e). Adding sink_val to
raw logits causes it to be scaled too. Fix: add sink_val/scale instead,
so after scaling: (sink_val/scale) * scale_log2 = sink_val * log2(e).
This correctly multiplies attention weights by exp(sink_val).

2026-05-26 15:03:49 +00:00

attention

fix: sink bias domain correction — add attn_sink/scale to raw logits

2026-05-26 15:03:49 +00:00

cache

KV Cache: schema, allocator, pools, manager, append_swa kernel

2026-05-22 00:08:38 +00:00

compressor

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

cuda

fix quantize_nvfp4 kernel: use proven single-thread-per-CTA pattern from deinterleave_quantize.cu