Files
nvfp4-megamoe-kernel/dsv4
biondizzle f57de06eb5 Fix grouped_linear GEMM output buffer shape and extraction
- _output_buf_padded: (max_tokens * n_groups, o_lora_rank) — matches GEMM output
- Extraction: groups are stacked vertically, not horizontally
- Each group's output is (padded_rows, o_lora_rank) with o_lora_rank columns
2026-06-03 22:26:40 +00:00
..