vllm-glm

Author	SHA1	Message	Date
biondizzle	9b33c8145e	expirimental fix	2026-04-15 12:00:23 +00:00
biondizzle	b7eb473977	moar cuda	2026-04-15 08:07:14 +00:00
biondizzle	b5d39f2d1a	cuda headers	2026-04-15 07:59:17 +00:00
biondizzle	28f9f4c172	force for blackwell	2026-04-15 07:56:50 +00:00
biondizzle	9cbc1e2777	well need cuda	2026-04-15 07:41:18 +00:00
biondizzle	fec79d93e5	and another one	2026-04-15 07:32:54 +00:00
biondizzle	0b81a87f71	and another	2026-04-15 07:28:47 +00:00
biondizzle	2cfd5f5027	fix git	2026-04-15 07:27:01 +00:00
biondizzle	64784741de	fix lmcache	2026-04-15 07:25:23 +00:00
biondizzle	0b70c975bd	feat: add pip install lmcache for KV cache offloading	2026-04-15 04:43:05 +00:00
biondizzle	139e617ed0	Clean up README with full bug analysis for ZAI	2026-04-09 06:21:04 +00:00
biondizzle	aa4f667ab8	Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility	2026-04-09 05:20:47 +00:00
biondizzle	8d5da5750d	patch parser	2026-04-09 04:28:22 +00:00
biondizzle	40159e865e	init commit	2026-04-08 18:27:23 +00:00
biondizzle	bf66b8708c	GLM-5.1 tool parser with incremental streaming support	2026-04-08 18:24:36 +00:00