Files
nvfp4-megamoe-kernel/tests/unit
biondizzle df05289d6f CUDA graph: Fix remaining sync violations from B200 detector run 2
1. grouped_linear.py: Remove conditional host read of GPU tensor
   - 'if group_offsets[0] != 0' reads GPU value on host → sync
   - Fix: unconditionally update offsets every call (GPU-only multiply)

2. test_cuda_graph_readiness.py: Use pinned CPU buffers for token transfer
   - dec_tid_buf[0] = python_int → CPU→GPU sync
   - Fix: write to pinned CPU buffer, then copy_ (async, graph-capturable)

3. Add dsv4/decode/cuda_graph_decoder.py (skeleton)
2026-06-03 17:20:34 +00:00
..
2026-05-28 16:28:58 +00:00
2026-05-28 16:28:58 +00:00
2026-05-28 16:28:58 +00:00
2026-05-28 16:28:58 +00:00
2026-05-30 03:46:38 +00:00
2026-05-28 16:28:58 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:46:53 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:55:59 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 15:59:22 +00:00
2026-05-28 14:38:03 +00:00
2026-06-02 09:43:45 +00:00
2026-05-28 14:40:55 +00:00
2026-05-28 14:33:31 +00:00
2026-05-28 16:36:53 +00:00
2026-05-28 17:00:20 +00:00
2026-05-28 16:39:45 +00:00
2026-05-28 16:42:24 +00:00
2026-05-28 15:51:55 +00:00
2026-05-28 15:49:47 +00:00
2026-05-28 15:48:15 +00:00
2026-05-28 15:54:05 +00:00
2026-05-28 11:39:15 +00:00