biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	8342e3abd1	[CI] Prune down lm-eval small tests (#17012 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-08 19:00:26 +00:00
Michael Goin	c70cf0fe06	[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-10 15:08:47 +08:00
Joe Runde	ef7faad1b8	🐛 Fixup more test failures from memory profiling (#9563 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-21 17:10:56 -07:00
Luka Govedič	172d1cd276	[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271 )	2024-09-27 14:25:10 -04:00
Michael Goin	af59df0a10	Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961 )	2024-08-28 19:19:17 -04:00
Michael Goin	44f26a9466	[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611 )	2024-08-16 15:56:34 -07:00
HandH1998	6512937de1	Support W4A8 quantization for vllm (#5218 )	2024-07-31 07:55:21 -06:00
Michael Goin	07278c37dd	[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611 )	2024-07-26 14:33:42 -04:00
Robert Shaw	889da130e7	[ Misc ] `fp8-marlin` channelwise via `compressed-tensors` (#6524 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-07-25 09:46:04 -07:00
Robert Shaw	4cc24f01b1	[ Kernel ] Enable Dynamic Per Token `fp8` (#6547 )	2024-07-19 23:08:15 +00:00
Robert Shaw	dbe5588554	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
Robert Shaw	aea19f0989	[ Misc ] Support Models With Bias in `compressed-tensors` integration (#6356 )	2024-07-12 11:11:29 -04:00
Robert Shaw	abfe705a02	[ Misc ] Support Fp8 via `llm-compressor` (#6110 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-07-07 20:42:11 +00:00
Robert Shaw	75aa1442db	[ CI/Build ] LM Eval Harness Based CI Testing (#5838 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-06-29 13:04:30 -04:00

14 Commits