Shape-based check (x_fp4.shape[0] != num_slots) silently fails when num_tokens == num_slots in L1 (topk=1). Now checks if slot_token is the identity mapping — only gathers when slot ordering differs from token ordering.
Shape-based check (x_fp4.shape[0] != num_slots) silently fails when num_tokens == num_slots in L1 (topk=1). Now checks if slot_token is the identity mapping — only gathers when slot ordering differs from token ordering.