Michael Goin
|
0cdbf5e61c
|
[Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 15:13:36 -04:00 |
|
Michael Goin
|
8342e3abd1
|
[CI] Prune down lm-eval small tests (#17012)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-08 19:00:26 +00:00 |
|
Reid
|
3642c59aa8
|
[CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (#16271)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-26 18:25:05 +00:00 |
|
Michael Goin
|
71eda0bb76
|
Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml (#16946)
|
2025-04-21 18:35:32 -06:00 |
|
Michael Goin
|
c70cf0fe06
|
[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-10 15:08:47 +08:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
Rahul Tuli
|
3b2005e1db
|
Add: Support for Sparse24Bitmask Compressed Models
|
2025-02-05 13:30:43 -08:00 |
|
Joe Runde
|
ef7faad1b8
|
🐛 Fixup more test failures from memory profiling (#9563)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-21 17:10:56 -07:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
Michael Goin
|
af59df0a10
|
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961)
|
2024-08-28 19:19:17 -04:00 |
|
Luka Govedič
|
7937009a7e
|
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-21 20:18:00 -04:00 |
|
Michael Goin
|
44f26a9466
|
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611)
|
2024-08-16 15:56:34 -07:00 |
|
Dipika Sikka
|
a3bbbfa1d8
|
[BugFix] Fix DeepSeek remote code (#7178)
|
2024-08-06 08:16:53 -07:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Michael Goin
|
07278c37dd
|
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611)
|
2024-07-26 14:33:42 -04:00 |
|
Robert Shaw
|
889da130e7
|
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-07-25 09:46:04 -07:00 |
|
Robert Shaw
|
9364f74eee
|
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606)
|
2024-07-20 18:50:10 +00:00 |
|
Robert Shaw
|
683e3cb9c4
|
[ Misc ] fbgemm checkpoints (#6559)
|
2024-07-20 09:36:57 -07:00 |
|
Robert Shaw
|
4cc24f01b1
|
[ Kernel ] Enable Dynamic Per Token fp8 (#6547)
|
2024-07-19 23:08:15 +00:00 |
|
Robert Shaw
|
dbe5588554
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
Tyler Michael Smith
|
9dad5cc859
|
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
|
2024-07-14 13:37:19 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Robert Shaw
|
aea19f0989
|
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356)
|
2024-07-12 11:11:29 -04:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|
Robert Shaw
|
7c008c51a9
|
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-02 21:54:35 +00:00 |
|
Robert Shaw
|
75aa1442db
|
[ CI/Build ] LM Eval Harness Based CI Testing (#5838)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-06-29 13:04:30 -04:00 |
|