15 Commits

Author SHA1 Message Date
9b33c8145e expirimental fix 2026-04-15 12:00:23 +00:00
b7eb473977 moar cuda 2026-04-15 08:07:14 +00:00
b5d39f2d1a cuda headers 2026-04-15 07:59:17 +00:00
28f9f4c172 force for blackwell 2026-04-15 07:56:50 +00:00
9cbc1e2777 well need cuda 2026-04-15 07:41:18 +00:00
fec79d93e5 and another one 2026-04-15 07:32:54 +00:00
0b81a87f71 and another 2026-04-15 07:28:47 +00:00
2cfd5f5027 fix git 2026-04-15 07:27:01 +00:00
64784741de fix lmcache 2026-04-15 07:25:23 +00:00
0b70c975bd feat: add pip install lmcache for KV cache offloading 2026-04-15 04:43:05 +00:00
139e617ed0 Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00
aa4f667ab8 Add hf.py patch to force string content format for GLM models
- Tool response content was being dropped because vLLM detected
  'openai' content format incorrectly for GLM templates
- Added _is_glm_model() detection to force 'string' format
- Updated Dockerfile to include hf.py patch
- Added debug tests for tool visibility
2026-04-09 05:20:47 +00:00
8d5da5750d patch parser 2026-04-09 04:28:22 +00:00
40159e865e init commit 2026-04-08 18:27:23 +00:00
bf66b8708c GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:24:36 +00:00