Commit Graph

828 Commits

Author SHA1 Message Date
3fd302e7a0 Fix nvcc goto-bypasses-init errors in multi-head test 2026-05-28 19:33:04 +00:00
aa41cfa2e5 Multi-head FMHA kernel (Milestone 5): grid launch with MHA/MQA/batch support
- fmha_6warp_multihead.cuh: grid=(1, n_h, batch) kernel with FmhaParams
- MQA support via k_head_stride=0 / v_head_stride=0
- LSE output for multi-segment KV merge composition
- test_fmha_6warp_multihead.cu: MHA (4+8 heads), MQA, batched tests
- HD-specific wrappers for hd=16/64/128/256
- Marked E2M1 dequant bug as FIXED in consultant issue file
2026-05-28 19:32:35 +00:00
6af2feb42a TMA 5D test: element stride decomposition 2026-05-28 19:18:01 +00:00
96f2f0bb90 auto: pre-test commit 2026-05-28 19:12:23 +00:00
015435b1ab auto: pre-test commit 2026-05-28 19:09:50 +00:00
41343fdc6b auto: pre-test commit 2026-05-28 19:08:04 +00:00
a723b524f7 TMA alignment test 2026-05-28 17:00:20 +00:00
c54a83960d TMA debug: fix globalStrides to tensorRank-1 elements 2026-05-28 16:58:30 +00:00
944e567b6c TMA debug: test various CUtensorMap configs 2026-05-28 16:55:25 +00:00
55d289c65b Fix TMA: use CU_TENSOR_MAP_DATA_TYPE_BFLOAT16 not UINT16 2026-05-28 16:51:40 +00:00
0fd3e12a52 Fix TMA test: globalStrides in bytes not elements 2026-05-28 16:46:56 +00:00
ad8050bbad WIP: TMA load test infrastructure (manual compile needed) 2026-05-28 16:45:04 +00:00
d9df1e6486 auto: pre-test commit 2026-05-28 16:42:24 +00:00
a4211559cf auto: pre-test commit 2026-05-28 16:40:51 +00:00
3b8fdcc823 auto: pre-test commit 2026-05-28 16:39:45 +00:00
072fbf0b5d auto: pre-test commit 2026-05-28 16:36:53 +00:00
2a6d72912a auto: pre-test commit 2026-05-28 16:28:58 +00:00
01319d7247 auto: pre-test commit 2026-05-28 15:59:22 +00:00
43516ed4ec auto: pre-test commit 2026-05-28 15:55:59 +00:00
1ec3e1ed2c auto: pre-test commit 2026-05-28 15:55:18 +00:00
babff1f402 auto: pre-test commit 2026-05-28 15:54:05 +00:00
2b007d2008 auto: pre-test commit 2026-05-28 15:53:39 +00:00
84b997881f auto: pre-test commit 2026-05-28 15:53:04 +00:00
6e5401df3b auto: pre-test commit 2026-05-28 15:51:55 +00:00
102174fade auto: pre-test commit 2026-05-28 15:50:52 +00:00
2dcfc0089f auto: pre-test commit 2026-05-28 15:49:47 +00:00
1cdb90462f auto: pre-test commit 2026-05-28 15:48:15 +00:00
80fd612132 auto: pre-test commit 2026-05-28 15:47:58 +00:00
9583cbc67a auto: pre-test commit 2026-05-28 15:46:53 +00:00
1b86860c19 auto: pre-test commit 2026-05-28 15:46:16 +00:00
6249989cf6 Clean up HD=64 test, V layout verified correct 2026-05-28 15:21:33 +00:00
e1daad6955 Verify V SMEM values vs GMEM for HD=64 2026-05-28 15:19:31 +00:00
bafd26707b FMHA HD=64 with BLOCK_MN_B=16, 4 N-tiles per K-tile 2026-05-28 15:17:40 +00:00
6b9b06647a Clean up HD=64 debug prints, keep register-math PV check 2026-05-28 15:15:22 +00:00
5c9d471162 Add register-math PV reference for HD=64 debug 2026-05-28 15:13:47 +00:00
43e9efbc2b Fix string literal 2026-05-28 15:12:20 +00:00
906be7ce50 Add filtered cosine (exclude near-zero) 2026-05-28 15:11:14 +00:00
40c83c769a Fix: remove ×2 QK scale correction (MMA scale is 1.0, not 0.5) 2026-05-28 15:09:57 +00:00
6ea7356fdd Debug: print P values for HD=64 2026-05-28 15:07:55 +00:00
4b052f22a5 Fix: opt into >48KB shared memory for HD=64 2026-05-28 15:06:37 +00:00
7becbfc07e Fix: printf after var declarations 2026-05-28 15:03:25 +00:00
2d44f8e356 Debug: check if HD=64 kernel starts 2026-05-28 15:02:00 +00:00
46e4d07c71 Test PV SS MMA with B=(64,16) BLOCK_MN=64 2026-05-28 14:58:10 +00:00
465e089a2b Add launch error check for HD=64 2026-05-28 14:56:07 +00:00
2fd64c464d FMHA HD=64 with BLOCK_MN_B=64 for V, proper output dimensions 2026-05-28 14:54:10 +00:00
15ecc1f616 Full FMHA HD=64 with PV SS MMA (SMEM-P) 2026-05-28 14:52:29 +00:00
5b2e690936 Milestone: Full FMHA HD=16 with PV SS MMA (SMEM-P) — cosine 0.9997 2026-05-28 14:50:43 +00:00
78026839b7 Fix V canonical layout: swap g_mn/g_k indices (d=MN, lr=K) 2026-05-28 14:49:17 +00:00
9a3b43c42b Fix reference to also use uniform P 2026-05-28 14:47:10 +00:00
75bdcbf728 Debug: override P with uniform 1/128 2026-05-28 14:46:21 +00:00