-
5d975d00d9
feat: tqdm progress bar for expert weight loading
biondizzle
2026-05-16 06:09:22 +00:00
-
2e4ff6b8d4
fix: increase vLLM RPC timeout to 10 min for first-request JIT
biondizzle
2026-05-16 06:02:11 +00:00
-
a569612df5
feat: add load progress heartbeats to prevent k8s health check kills
biondizzle
2026-05-16 05:51:35 +00:00
-
e5370140cb
docs: update README with full NVFP4 coverage, dequant anti-pattern, v2 status
biondizzle
2026-05-16 05:43:33 +00:00
-
3445bd24c1
feat: keep attention weights native NVFP4 — stop dequantizing to BF16
biondizzle
2026-05-16 05:36:34 +00:00
-
4d4cfa6b28
fix: tqdm over MoE layer warmup, compile every layer, no print spam
biondizzle
2026-05-16 05:21:11 +00:00
-
3838561c19
fix: only suppress compile message, still warmup all layers
biondizzle
2026-05-16 05:18:10 +00:00
-
f19932d8db
fix: compile CuTeDSL kernel once per process, not per MoE layer
biondizzle
2026-05-16 05:16:53 +00:00
-
936982c5aa
fix: add layer-level tqdm for expert finalization, remove inner expert tqdm
biondizzle
2026-05-16 05:01:20 +00:00
-
cf0731cf4b
fix: warmup with 128 tokens (fills MMA tile), better error handling
biondizzle
2026-05-16 04:56:45 +00:00
-
a70d2d3984
fix: clearer warmup message — 'Compiling CuTeDSL NVFP4 MegaMoE kernel'
biondizzle
2026-05-16 04:40:31 +00:00
-
f191af7e29
feat: warm up CuTeDSL kernel during model loading
biondizzle
2026-05-16 04:39:05 +00:00
-
4d67b570b9
fix: descriptive tqdm labels — uint8→NVFP4 and NVFP4→FP8/BF16
biondizzle
2026-05-16 04:28:25 +00:00
-
8efdd165da
fix: use tqdm for progress bars — single line, live updating
biondizzle
2026-05-16 04:26:43 +00:00
-
830f042443
fix: PYTHONUNBUFFERED=1 so progress bars stream in real-time
biondizzle
2026-05-16 04:18:07 +00:00
-
00b766af60
feat: add progress bars for expert quantization and post-load conversion
biondizzle
2026-05-16 04:14:07 +00:00
-
b465579a02
cleanup: nuke all debug prints and env var gates from vLLM patch
biondizzle
2026-05-16 04:10:42 +00:00
-
174ad70dca
fix: same gate/up split fix in moe_pipeline.py
biondizzle
2026-05-16 04:04:53 +00:00
-
6d17988b51
fix: L1 gate/up split — intermediate_size is per-projection, not fused
biondizzle
2026-05-16 04:04:40 +00:00
-
37aa0cbeab
debug: add try/except with shape logging to _run_mega_moe
biondizzle
2026-05-16 04:02:01 +00:00
-
b04bff7e8b
feat: clean Dockerfile, docker-compose, import fixes for CuTeDSL build
biondizzle
2026-05-16 03:50:07 +00:00
-
a0ff8a3278
fix: transpose checkpoint block scales (N,K_sf)→(K_sf,N) for bridge
biondizzle
2026-05-16 03:43:30 +00:00
-
389453fbf4
feat: direct NVFP4 path — no BF16 round-trip on weights
biondizzle
2026-05-16 03:41:23 +00:00
-
8fd9579127
feat: vLLM integration — replace C++ kernel with CuTeDSL
biondizzle
2026-05-16 03:36:12 +00:00
-
3ec9c3074b
docs: rewrite README, nuke DEBUG_LOG, add vLLM integration stub
biondizzle
2026-05-16 03:33:16 +00:00
-
b685112c92
fix: lower cosine threshold to 0.98 for double-quantization loss
biondizzle
2026-05-16 03:24:13 +00:00
-
6139cd6ff5
fix: rewrite layertest cleanly, test full MoE pipeline
biondizzle
2026-05-16 03:23:33 +00:00
-
09ff5c5b98
feat: full NVFP4 MoE pipeline (L1→SiLU→L2→scatter)
biondizzle
2026-05-16 03:22:43 +00:00
-
0359215ab4
fix: compare kernel vs BF16 in slot-major layout
biondizzle
2026-05-16 03:18:41 +00:00
-
ed18638a3c
fix: slot-major token layout for grouped GEMM
biondizzle
2026-05-16 03:17:19 +00:00
-
5385de3142
fix: layertest tests L1 GEMM only with correct output size
biondizzle
2026-05-16 03:15:29 +00:00
-
0cdcc4144a
refactor: add cutedsl/bridge.py, rewrite layertest to use it
biondizzle
2026-05-16 03:13:54 +00:00
-
2ef71dc21a
fix: B tensor K-major strides, scale_b axis swap
biondizzle
2026-05-16 03:04:31 +00:00
-
6294b84213
fix: B tensor must be K-major (transpose last 2 dims)
biondizzle
2026-05-16 03:03:00 +00:00
-
7c882fe2e0
fix: correct weight quantization for CuTeDSL kernel
biondizzle
2026-05-16 02:58:55 +00:00
-
ca28f1335d
refactor: copy CuTeDSL kernel into repo with local imports
biondizzle
2026-05-16 02:57:54 +00:00
-
a3aa2d201e
fix: clarify import path setup for CuTeDSL
biondizzle
2026-05-16 02:55:25 +00:00
-
f951d284e7
test: add CuTeDSL NVFP4 GEMM test using reference ScaledGroupedGemmKernel
biondizzle
2026-05-16 02:55:04 +00:00
-
a2ea836c74
docs: add CuTeDSL rewrite plan + reference files
biondizzle
2026-05-16 02:41:51 +00:00
-
c4a262bd54
test: streamline layertest — kernel vs BF16 ref only, exit on fail
biondizzle
2026-05-16 02:29:41 +00:00
-
de9b50cbe7
fix: use setup.py install for CUTLASS extension build
biondizzle
2026-05-16 02:21:17 +00:00
-
882bff8fb7
fix: also build CUTLASS C++ extension in run_test.sh
biondizzle
2026-05-16 02:19:40 +00:00
-
55d9a24bf6
fix: handle model. prefix normalization in checkpoint keys
biondizzle
2026-05-16 02:18:52 +00:00
-
bdf9f31ae2
fix: checkpoint keys don't have 'model.' prefix
biondizzle
2026-05-16 02:17:13 +00:00
-
ea5ee7c1f7
fix: remove prefix_filter from layer tensor loading
biondizzle
2026-05-16 02:15:55 +00:00
-
303b6a8993
cleanup: move useful tests to tests/, nuke stale debug tests
biondizzle
2026-05-16 02:14:37 +00:00
-
2114bd11be
test: add standalone layer 0 comparison test (no vLLM, no Docker)
biondizzle
2026-05-16 02:13:18 +00:00
-
294e9f98f2
cleanup: rename _ue8m0_to_float32 → _block_scale_to_float32, remove dead code
biondizzle
2026-05-16 01:55:56 +00:00
-
4a624879ca
docs: update DEBUG_LOG — input_scale red herring, current state, next steps
biondizzle
2026-05-16 01:15:49 +00:00
-
79b9becf9c
revert: don't use checkpoint input_scale for activation normalization
biondizzle
2026-05-16 00:12:41 +00:00
-
a7eae10ef4
fix: use checkpoint input_scale for activation quantization
biondizzle
2026-05-15 23:57:08 +00:00
-
af50e98fe9
test: B layout test with N=128 K=256
biondizzle
2026-05-15 23:52:22 +00:00
-
efd7a2c56d
test: B matrix weight layout verification via one-hot A
biondizzle
2026-05-15 23:52:00 +00:00
-
bb5a1ba4c8
cleanup: remove unused slot_token from nvfp4_moe_l2
biondizzle
2026-05-15 23:50:39 +00:00
-
887360281e
docs: major update — SF remap verified correct, BF16 ref is the red herring
biondizzle
2026-05-15 23:38:34 +00:00
-
eb26d291cb
test: uniform FP4 + uniform SF sanity check
biondizzle
2026-05-15 23:36:08 +00:00
-
1f09b51168
test: check SF signed vs unsigned interpretation
biondizzle
2026-05-15 23:35:06 +00:00
-
4f857d5f99
docs: major DEBUG_LOG update — forward mapping, verifier, full debug timeline
biondizzle
2026-05-15 23:02:30 +00:00
-
aa209ddd21
debug: add SF remap roundtrip verifier
biondizzle
2026-05-15 22:59:44 +00:00
-
6626b75a2f
fix: use filter_zeros for SF allocation + no-branch forward mapping
biondizzle
2026-05-15 22:58:51 +00:00
-
6fc8fa61e0
fix: use flat logical coordinate layout_sf(make_coord(mn, k_elem, 0))
biondizzle
2026-05-15 22:53:57 +00:00
-
a48717ccf5
fix: remove duplicate dst_idx declaration
biondizzle
2026-05-15 22:31:05 +00:00
-
5ff1b9e401
fix: use hierarchical coordinates for layout_sf forward mapping
biondizzle
2026-05-15 22:11:14 +00:00
-
3b4a7b591f
test: verify forward mapping with prepack vs live SFB
biondizzle
2026-05-15 22:09:56 +00:00
-
a1fd4d6233
revert: back to layout_sf(make_coord(...)) — crd2idx was unnecessary
biondizzle
2026-05-15 21:55:00 +00:00
-
ea678ece64
fix: remove duplicate template declaration
biondizzle
2026-05-15 21:54:10 +00:00
-
59dad8e2fb
fix: use crd2idx instead of layout operator() for SF forward mapping
biondizzle
2026-05-15 21:52:02 +00:00
-
a09d8e477e
fix: remove static_assert in constexpr else (build fix)
biondizzle
2026-05-15 21:27:27 +00:00
-
7285331395
fix: replace col_major_src with explicit source strides
biondizzle
2026-05-15 21:23:21 +00:00
-
f6fd549800
fix: restore col_major_src handling for SFB source layout
biondizzle
2026-05-15 21:19:58 +00:00
-
63e67e1025
fix: rewrite SF remap as forward mapping (source→dst)
biondizzle
2026-05-15 20:51:30 +00:00
-
30b6c89424
fix: correct SF remap coordinate extraction
biondizzle
2026-05-15 20:44:46 +00:00
-
ff5a0843dc
fix: divide K element index by SFVecSize to get k_sf
biondizzle
2026-05-15 20:17:24 +00:00
-
a09b9b53a3
cleanup: remove printf and diag function from CUDA kernel (build fix)
biondizzle
2026-05-15 20:11:40 +00:00
-
e7c3341317
docs: update DEBUG_LOG with M/K swap root cause
biondizzle
2026-05-15 20:03:20 +00:00
-
deb6b3231a
debug: swap M/K in SF remap + add printf diagnostics
biondizzle
2026-05-15 20:01:47 +00:00
-
22f0457ccf
test: isolate SFA vs SFB remap bug
biondizzle
2026-05-15 19:59:39 +00:00
-
9eaf6d07e8
test: quick random test
biondizzle
2026-05-15 19:58:57 +00:00
-
fa7b394571
docs: update DEBUG_LOG with root cause (size→cosize) and full debug timeline
biondizzle
2026-05-15 18:56:09 +00:00
-
c3841983a0
fix: SF remap uses cute::cosize() instead of cute::size()
biondizzle
2026-05-15 18:52:23 +00:00
-
67dcfa83f5
test: random data at small dims + alpha sweep
biondizzle
2026-05-15 18:51:52 +00:00
-
60f7f60818
test: ultra-minimal GEMM with all-ones
biondizzle
2026-05-15 18:51:31 +00:00
-
363dd893f0
test: dimension sweep to isolate GEMM bug
biondizzle
2026-05-15 18:51:09 +00:00
-
fee5a97ebb
fix: cosine_similarity dim for M>0
biondizzle
2026-05-15 18:50:45 +00:00
-
f9330a1777
test: standalone M=1 GEMM test with deterministic data
biondizzle
2026-05-15 18:47:26 +00:00
-
1b63a46168
docs: update DEBUG_LOG with cosine≈0 finding + new hypotheses
biondizzle
2026-05-15 18:35:00 +00:00
-
773967452f
debug: fix gs scalar conversion + add traceback
biondizzle
2026-05-15 18:27:44 +00:00
-
df916b87eb
debug: fix gs.item() for multi-element tensor
biondizzle
2026-05-15 18:09:41 +00:00
-
755f9ad567
debug: fix per_expert_alpha ref + clean up BF16 reference scaling
biondizzle
2026-05-15 17:55:11 +00:00
-
de8acc7965
debug: dump raw GEMM inputs + first 8 output values
biondizzle
2026-05-15 17:02:40 +00:00
-
9159cb6bb3
docs: add debug log — current state, hypotheses, fixes
biondizzle
2026-05-15 15:48:57 +00:00
-
2fd55a94c6
fix: weight reshape bug + igs double-count in BF16 reference
biondizzle
2026-05-15 15:46:16 +00:00
-
c421a668f3
debug: BF16 reference GEMM + cosine comparison for L1
biondizzle
2026-05-15 14:16:24 +00:00
-
995589ac8a
debug: add FP4 quantization round-trip diagnostic
biondizzle
2026-05-15 13:41:09 +00:00
-
d0ed3d84a8
debug: add L2, SiLU, and scatter pipeline prints
biondizzle
2026-05-15 13:21:25 +00:00
-
da5572f497
clean: remove diagnostic scripts from repo
biondizzle
2026-05-15 12:50:14 +00:00
-
fd59222fc0
fix: stop folding global scale into float8 block scales
biondizzle
2026-05-15 12:42:53 +00:00
-
56e62e916d
revert: idx2crd remap approach — source-first needs hierarchical coords
biondizzle
2026-05-15 11:44:38 +00:00
-
d5949a23b4
fix: use cute::crd2idx for SF remap — layout_sf() not directly callable
biondizzle
2026-05-15 11:39:57 +00:00
-
9908fd64d9
feat: CUTLASS NVFP4 mega_moe kernel — slot-based L1/L2, source-first SF remap
biondizzle
2026-05-15 11:38:18 +00:00