-
0c77a88757
sync: latest Dockerfile + nvfp4_linear.py patch from B200
modelopt-nvfp4
biondizzle
2026-05-14 16:47:27 +00:00
-
f2656dcf6d
sync B200 deployment files: Dockerfile, docker-compose, patches
biondizzle
2026-05-14 14:12:52 +00:00
-
f08bcd456b
tweax n shit
mega-moe-nvfp4
biondizzle
2026-05-12 23:16:33 +00:00
-
2bdda36bb7
fucken aye
biondizzle
2026-05-12 22:33:55 +00:00
-
fa825c16b9
fucken ay
biondizzle
2026-05-12 21:58:08 +00:00
-
dbf1d11f9f
ayee
biondizzle
2026-05-12 21:52:48 +00:00
-
ef3edb3481
ba fongol again4
biondizzle
2026-05-12 21:49:43 +00:00
-
b74dc7121a
ba fongol again3
biondizzle
2026-05-12 21:48:00 +00:00
-
7bbbdbcc79
ba fongol again
biondizzle
2026-05-12 21:47:37 +00:00
-
2e674f87c1
ba fongol again
biondizzle
2026-05-12 21:47:14 +00:00
-
5d127d8294
ba fongol again
biondizzle
2026-05-12 21:47:01 +00:00
-
52cf3f2e25
ba fongol again
biondizzle
2026-05-12 21:46:42 +00:00
-
02decb486e
ba fongol
biondizzle
2026-05-12 21:34:08 +00:00
-
48f1f9dc5e
clanker nonsense again
biondizzle
2026-05-12 21:30:36 +00:00
-
5cabc1f7d9
clanker nonsense again
biondizzle
2026-05-12 21:29:59 +00:00
-
25a2d4e6ad
clanker nonsense
biondizzle
2026-05-12 21:28:50 +00:00
-
d88ea9842b
fix: add missing staging_kernel.py to Dockerfile — BF16→E2M1+UE4M3 quantization was never in container
biondizzle
2026-05-12 21:21:24 +00:00
-
91d7d9bad7
fucken a
biondizzle
2026-05-12 21:18:48 +00:00
-
d68e113af1
remove spammy shit
biondizzle
2026-05-12 20:57:04 +00:00
-
f0652693a6
dangit again
biondizzle
2026-05-12 19:13:01 +00:00
-
054792c84e
dangit
biondizzle
2026-05-12 18:42:39 +00:00
-
de055b1e77
syupid clankers
biondizzle
2026-05-12 18:26:37 +00:00
-
307574bc91
test: signal alarm timeout for kernel hang
biondizzle
2026-05-12 15:14:39 +00:00
-
fcd6de0a60
test: simplify SF fill to avoid shape mismatch
biondizzle
2026-05-12 15:13:16 +00:00
-
d4c557fddc
test: fix float8 randn + SF int32 packing
biondizzle
2026-05-12 15:12:35 +00:00
-
28afc2406b
test: add random FP4 data and kernel timeout
biondizzle
2026-05-12 15:11:41 +00:00
-
787d427847
test: fix NVFP4 mega_moe test dimensions for SMEM alignment
biondizzle
2026-05-12 15:07:35 +00:00
-
8737fd57c0
remove crap
biondizzle
2026-05-12 14:53:42 +00:00
-
52c3aefe73
bump cache busters to 33 for debug build
biondizzle
2026-05-12 13:10:37 +00:00
-
ca1d306890
fix: use torch.int8 for packed FP4 tensors (kPackedFP4=kInt8, not uint8)
biondizzle
2026-05-12 12:23:43 +00:00
-
b8f95ffad3
docker: add OMP_NUM_THREADS=64, remove --tool initcheck, mount cubin cache
biondizzle
2026-05-12 11:15:06 +00:00
-
5840291ea3
fix staging kernel packed_k_mask double-count
biondizzle
2026-05-12 08:08:24 +00:00
-
5ea5b579c3
Trim banner, no code changes
biondizzle
2026-05-12 07:24:36 +00:00
-
74af9984f6
Bug fixes: UE4M3 scale conversion, staging kernel SF/E2M1 packing, wo_a UE4M3, README overhaul
biondizzle
2026-05-12 05:52:30 +00:00
-
a36bf47f11
fix: use tl.split instead of indexing for E2M1 pair packing
biondizzle
2026-05-11 22:39:38 +00:00
-
27dbf2850f
fix: replace nested tl.where with sum-of-comparisons for E2M1 quantization
biondizzle
2026-05-11 22:23:05 +00:00
-
3d1f3de190
fix: syntax error — move triton imports before docstring, remove orphan @triton.jit
biondizzle
2026-05-11 22:08:50 +00:00
-
79d866995f
bump cache buster 32 for packed FP4 mxf4nvf4 fix
biondizzle
2026-05-11 21:59:56 +00:00
-
c85b84b0fe
fix: staging kernel outputs unpacked E2M1 (1 byte/element, not packed 2/byte)
biondizzle
2026-05-11 21:29:33 +00:00
-
01cfd02759
fix: same reshape fix in main patch file
biondizzle
2026-05-11 21:05:54 +00:00
-
076d325c97
fix: use reshape instead of risky [0::2] slicing for E2M1 packing
biondizzle
2026-05-11 21:04:53 +00:00
-
8dc917c498
fix: topk_weights_out store missing topk_offsets multiplier
biondizzle
2026-05-11 21:02:19 +00:00
-
17ba5a9d7b
bump cache buster 30 for FP4 staging + DeepGEMM FP4 activations
biondizzle
2026-05-11 20:30:14 +00:00
-
7a4403fa98
feat: FP4 staging kernel - BF16 → E2M1 packed + UE4M3 block16 scales
biondizzle
2026-05-11 20:29:36 +00:00
-
0fd2d4f078
diag: add weight_scale uint8 histogram to verify E8M0 vs E4M3 format
biondizzle
2026-05-11 19:55:41 +00:00
-
50a945bde4
bump cache buster 29
biondizzle
2026-05-11 19:51:48 +00:00
-
48b905406a
diag: add CUDA sync after mega_moe finalize + forward to catch errors
biondizzle
2026-05-11 19:51:44 +00:00
-
35f6b66678
fix: UE8M0 reinterpret in DeepGEMM fold_global_scale + bump cache
biondizzle
2026-05-11 19:40:08 +00:00
-
f32d6b5b48
bump cache buster to 27
biondizzle
2026-05-11 19:26:21 +00:00
-
cd24182e36
diag: add NaN/Inf + FP8-dtype checks after NVFP4 dequant
biondizzle
2026-05-11 19:26:12 +00:00
-
8ae2214bad
fix: reorder Dockerfile ARG before COPY for proper cache busting
biondizzle
2026-05-11 18:48:07 +00:00
-
c4891e9ee2
fix: manual FP32→UE4M3 quant in Triton staging kernel
biondizzle
2026-05-11 16:38:49 +00:00
-
436109081c
bump cache buster to 24
biondizzle
2026-05-11 16:12:56 +00:00
-
5faf9916eb
fix: UE4M3 activation scales + group_size=16 for NVFP4 mega_moe
biondizzle
2026-05-11 16:12:36 +00:00
-
220649c188
docs: CORRECTED — mxf4nvf4 IS supported on sm_100a (B200)
biondizzle
2026-05-11 14:24:13 +00:00
-
cfead0012d
docs: comprehensive README update through build 22
biondizzle
2026-05-11 13:53:41 +00:00
-
8cb23bdb78
fix: import NVFP4 SymmBuffer from deep_gemm.mega
biondizzle
2026-05-11 08:05:50 +00:00
-
ff579c9767
fix: use NVFP4 SymmBuffer (2x SF size for group_size=16)
biondizzle
2026-05-11 07:49:11 +00:00
-
1da40c53da
fix: add patch cache buster to Dockerfile
biondizzle
2026-05-11 07:19:10 +00:00
-
b532742530
debug: add shape/dtype logging to finalize_weights
biondizzle
2026-05-11 07:13:44 +00:00
-
b1cf4232ee
feat: wire DeepGEMM NVFP4 mega_moe kernel into vLLM patch
biondizzle
2026-05-11 06:22:11 +00:00
-
a2e9b5f17f
fix: add --enable-expert-parallel to compose command
biondizzle
2026-05-11 06:15:11 +00:00
-
c8564caf9d
fix: patch vLLM deepseek_v4.py directly in image
biondizzle
2026-05-11 06:09:40 +00:00
-
7c8c6cd67f
fix: add PYTHONPATH for deep_gemm import
biondizzle
2026-05-11 06:06:52 +00:00
-
cffb373759
fix: symlink NVRTC lib into cuda/lib64 for linker
biondizzle
2026-05-11 06:04:24 +00:00
-
983ba02c5b
fix: add CUDA/NVRTC lib paths to Dockerfile
biondizzle
2026-05-11 06:02:13 +00:00
-
f0471ed1c2
fix: correct CR URL to atl.vultrcr.com
biondizzle
2026-05-11 05:59:06 +00:00
-
c234190a80
feat: add Dockerfile + build/push script for NVFP4 container
biondizzle
2026-05-11 05:57:49 +00:00
-
e963325b61
WIP: MegaMoE NVFP4 kernel + diagnostics
biondizzle
2026-05-11 05:19:49 +00:00
-
-
7e2f219259
fix: banner uses _os instead of os (not yet imported)
biondizzle
2026-05-11 04:57:24 +00:00
-
cf54b4755a
fix CRITICAL #7: UE8M0 block scale misinterpreted as E4M3
biondizzle
2026-05-11 04:37:33 +00:00
-
7febeaeb71
README: document bugs #5 (input_scale) and #6 (fused_skip_regex), add version banner section, update status
biondizzle
2026-05-11 04:28:38 +00:00
-
26aaaba4a2
Add version banner to patch — prints commit, arch, bugs fixed at startup
biondizzle
2026-05-11 04:28:10 +00:00
-
67f9086a26
Fix critical dequantization bug: remove input_scale from weight dequant
biondizzle
2026-05-11 02:23:18 +00:00
-
02b8ea536f
Update MEMORY.md and memory files with vLLM NVFP4 serving progress
biondizzle
2026-05-11 02:02:14 +00:00
-
653e2d7a50
vLLM NVFP4 serving: full end-to-end pipeline working
biondizzle
2026-05-11 02:01:46 +00:00
-
db16be8e5d
S11: Fixed substr mapping, stacking, suffix, and o_a_proj - loads weights but attention forward uses FP8 einsum incompatible with NVFP4
biondizzle
2026-05-10 17:45:53 +00:00
-
6fd03a0aa0
vLLM serving: patched deepseek_v4.py, disabled mega_moe, updated docs
biondizzle
2026-05-10 16:14:17 +00:00
-
d88793dee6
Add vllm weight mapper patch and docker-compose
biondizzle
2026-05-10 09:33:48 +00:00
-
30608e3834
Config patches: document modelopt↔vllm gaps with NVIDIA reference
biondizzle
2026-05-10 08:59:28 +00:00
-
0d74b97fb2
Config patches doc + compress_ratios runtime patch in serve script
biondizzle
2026-05-10 08:23:11 +00:00
-
f65d4ab99f
Run 11 SUCCESS: 881GB NVFP4 exported, add vLLM serve script
biondizzle
2026-05-10 07:54:34 +00:00
-
eb80bd6f80
README + memory: Run 10 result (export crash in get_weight_scaling_factor), Run 11 running
biondizzle
2026-05-09 23:00:17 +00:00
-
07cd50e823
8 patches covering full export chain — no more whack-a-mole
biondizzle
2026-05-09 22:50:58 +00:00
-
efc111a11f
Add Patch 4+5: get_weight_scaling_factor and get_weight_scaling_factor_2 CPU safety
biondizzle
2026-05-09 22:43:48 +00:00
-
ce9056d259
README overhaul: reflect current architecture (hf_main, run history through Run 10)
biondizzle
2026-05-09 16:09:09 +00:00
-
5a72da7193
Fix: apply hf_ptq __main__ post-parse conversions (dataset split, calib_size int list)
biondizzle
2026-05-09 15:58:36 +00:00
-
8612914169
Update run history: Runs 7-8, Run 9 running on
a300302
biondizzle
2026-05-09 15:00:23 +00:00
-
a300302486
Fix: use hf_ptq.py arg names (--pyt_ckpt_path, --qformat, --inference_tensor_parallel)
biondizzle
2026-05-09 14:57:28 +00:00
-
1a36a655ea
Fix: use full argparse flag names (--calib_size, --kv_cache_qformat)
biondizzle
2026-05-09 14:54:51 +00:00
-
b2849a8944
Fundamental rewrite: call hf_main() instead of rewriting the pipeline
biondizzle
2026-05-09 14:52:02 +00:00
-
a70593d886
Update run history: Run 6 (dataloader crash), Run 7 running on
25b4d8d
biondizzle
2026-05-09 13:40:00 +00:00
-
25b4d8da06
Fix: add missing args for make_calib_dataloader (dataset, calib_with_images, auto_quantize, specdec)
biondizzle
2026-05-09 13:37:24 +00:00
-
d1e15178b2
Update run history: Runs 4-5 (import bugs), Run 6 running on
6c1bff6
biondizzle
2026-05-09 09:29:20 +00:00
-
6c1bff6997
Clean rewrite: verified all imports against runtime, removed dead code
biondizzle
2026-05-09 09:26:23 +00:00
-
86dd8df302
Fix: KV_QUANT_CFG_CHOICES is in hf_ptq, not mtq
biondizzle
2026-05-09 09:17:12 +00:00
-
99f861f48a
Update README and memory: Run 3 OOM crash, Run 4 running on
f9bbef8
biondizzle
2026-05-09 08:10:04 +00:00
-
f9bbef8e91
Fix: patch load_calib_amax instead of amax property setter (can't patch readonly descriptor)
biondizzle
2026-05-09 08:04:03 +00:00
-
94179ed9d0
Fix typo: store_only → store_true
biondizzle
2026-05-09 08:02:09 +00:00
-
03c10ab3b6
Fix model loading: use modelopt get_model() instead of raw AutoModelForCausalLM
biondizzle
2026-05-09 08:00:50 +00:00