Files
nvfp4-megamoe-kernel/docs
biondizzle 6421f7c3f3 P4 RESOLVED: TMA hang was GMEM misalignment, not descriptor/driver issue
Evidence: TMA loads succeed with 128B-aligned GMEM on all descriptor configs.
The bit-21 workaround was NOT needed. The 'misaligned address' crashes were
caused by passing non-128B-aligned GMEM pointers to cp.async.bulk.tensor.

Added docs/p4_tma_hang_resolution.md with root cause and fix.
Cleaned up stale P4 test files.
2026-05-30 08:42:18 +00:00
..