9b33c8145e
expirimental fix
2026-04-15 12:00:23 +00:00
b7eb473977
moar cuda
2026-04-15 08:07:14 +00:00
b5d39f2d1a
cuda headers
2026-04-15 07:59:17 +00:00
28f9f4c172
force for blackwell
2026-04-15 07:56:50 +00:00
9cbc1e2777
well need cuda
2026-04-15 07:41:18 +00:00
fec79d93e5
and another one
2026-04-15 07:32:54 +00:00
0b81a87f71
and another
2026-04-15 07:28:47 +00:00
2cfd5f5027
fix git
2026-04-15 07:27:01 +00:00
64784741de
fix lmcache
2026-04-15 07:25:23 +00:00
0b70c975bd
feat: add pip install lmcache for KV cache offloading
2026-04-15 04:43:05 +00:00
139e617ed0
Clean up README with full bug analysis for ZAI
2026-04-09 06:21:04 +00:00
aa4f667ab8
Add hf.py patch to force string content format for GLM models
...
- Tool response content was being dropped because vLLM detected
'openai' content format incorrectly for GLM templates
- Added _is_glm_model() detection to force 'string' format
- Updated Dockerfile to include hf.py patch
- Added debug tests for tool visibility
2026-04-09 05:20:47 +00:00
8d5da5750d
patch parser
2026-04-09 04:28:22 +00:00
40159e865e
init commit
2026-04-08 18:27:23 +00:00
bf66b8708c
GLM-5.1 tool parser with incremental streaming support
2026-04-08 18:24:36 +00:00