|
|
116933dcf6
|
Fix: skip .cuda() when low_memory_mode; switch default to nvfp4
|
2026-05-07 03:06:33 +00:00 |
|
|
|
b8bdd00d19
|
Lower GPU max_memory to 100GiB, add CPU-only fallback for low_memory_mode
|
2026-05-07 02:49:24 +00:00 |
|
|
|
717151b98c
|
Add CPU offloading and max_memory caps for FP8 model loading
|
2026-05-07 02:40:48 +00:00 |
|
|
|
aff12c6951
|
Fix forward_loop: pass as callable, not via create_forward_loop
|
2026-05-07 02:08:09 +00:00 |
|
|
|
492e44c0f6
|
Fix dataloader API: max_sample_length not seq_len, proper create_forward_loop
|
2026-05-07 02:04:54 +00:00 |
|
|
|
b32bb2e84d
|
NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro
|
2026-05-07 00:11:31 +00:00 |
|
|
|
c40607053b
|
Fix remaining gate_proj/up_proj -> w1/w3 references in paired_names
|
2026-05-07 00:05:55 +00:00 |
|
|
|
771e42cef3
|
Fix expert pair dict keys: w1/w3 not gate_proj/up_proj
|
2026-05-07 00:05:25 +00:00 |
|
|
|
5f35a5d2b3
|
Gracefully handle missing scale tensors (BF16 weights with stale index entries)
|
2026-05-07 00:04:29 +00:00 |
|
|
|
4470653e15
|
Fix V4 tensor naming: .scale companions, w1/w3 expert pairs, ffn.gate, hc_* preserve
|
2026-05-07 00:03:20 +00:00 |
|
|
|
2b7f063e39
|
7 commit
|
2026-05-06 23:51:54 +00:00 |
|
|
|
be16bd023e
|
sixth commit
|
2026-05-06 23:50:51 +00:00 |
|
|
|
97e7638abc
|
sixth commit
|
2026-05-06 23:49:34 +00:00 |
|
|
|
75503a1190
|
fifth commit
|
2026-05-06 23:49:02 +00:00 |
|
|
|
2eeeefcf8f
|
fourth commit
|
2026-05-06 23:48:38 +00:00 |
|
|
|
31a4302ab6
|
third commit
|
2026-05-06 23:48:25 +00:00 |
|
|
|
18ba8e057f
|
second commit
|
2026-05-06 23:47:38 +00:00 |
|
|
|
4708cdebb2
|
init commit
|
2026-05-06 23:47:07 +00:00 |
|