Commit Graph

8 Commits

Author SHA1 Message Date
6692166d0f Update CURRENT_BUG.md: Bug 25 (swiglu_limit), shared expert path verification, variable padded offsets 2026-05-17 17:56:04 +00:00
87a223f1ac Update CURRENT_BUG.md: current status, outstanding garbage output issue, hypotheses 2026-05-17 16:52:40 +00:00
3d0b1408b4 Update CURRENT_BUG.md: Bug 21 (shared buffers), clean up status 2026-05-17 15:52:06 +00:00
e2f33596a2 Update CURRENT_BUG.md: status through Bug 20, fixed-layout padding architecture 2026-05-17 15:46:13 +00:00
0d3c928ff2 Update CURRENT_BUG.md: full status through Bug 14, vLLM integration status, architecture docs 2026-05-17 13:32:41 +00:00
eb7d4f099b Update CURRENT_BUG.md with Bug 8 (global→local expert ID) and Bug 8b (.cpu() sync) 2026-05-17 09:01:24 +00:00
ca3cba5bbd Fix global→local expert ID remapping for EP and remove .cpu() sync
Root cause of CUDA_ERROR_ASSERT index out of bounds:
- topk_ids contains GLOBAL expert IDs (0-255) but runner treated them
  as local IDs (0-31 with EP=8). Tokens for non-local experts got
  wrong expert assignments, causing out-of-bounds scatter indices
  in _assemble_scales_cudagraph_safe.

Fixes:
1. Add experts_start_idx param to CuTeDSLMoERunner
2. In run(), remap global→local IDs and zero weights for non-local experts
3. Move _token_indices from CPU to GPU (remove sort_idx.cpu() sync)
4. Add _fill_token_indices() and _needs_token_refill to handle CuTeDSL
   JIT GPU memory corruption (refill after first GEMM call)
2026-05-17 08:58:43 +00:00
ddffb7d8df docs: current bug analysis — scale_a layout vs expert_offsets mismatch 2026-05-17 07:53:58 +00:00