KEMBAR78

Dynamic memory allocation on demand to fully utilize device memory. No preset scratch size or memory size any more.
Drop Baichuan/InternLM support since they were integrated in llama.cpp.
API change:
- CMake CUDA option: -DGGML_CUBLAS changed to -DGGML_CUDA
- CMake CUDA architecture: -DCUDA_ARCHITECTURES changed to -DCMAKE_CUDA_ARCHITECTURES
- num_threads in GenerationConfig was removed: the optimal thread settings will be automatically selected.

Support ChatGLM4 conversation mode

Full functionality of ChatGLM3 including system prompt, function call and code interpreter
Brand new OpenAI-style chat API
Add token usage information in OpenAI api server to be compatible with LangChain frontend
Fix conversion error for chatglm3-6b-32k

Releases: li-plus/chatglm.cpp