KEMBAR78
WIP: Infer with LLaVA-RLHF by monatis · Pull Request #2 · monatis/lmm.cpp · GitHub
Skip to content

Conversation

@monatis
Copy link
Owner

@monatis monatis commented Oct 1, 2023

This is still WIP

After I implemented the GGUF support in clip.cpp, now it's time to combine clip.cpp + llama.cpp = llava.cpp (the first model to be supported in this repo).

For now, I copy CLIP conversion + model loading + inference code from clip.cpp and make necessary changes. In the future, these changes may be merged upstream and clip.cpp may be a submodule in this repo.

  • LLaVA surgery: merge base and LoRA weights, strip the multimodal projector.
  • Convert the LLaMA part with llama.cpp.
  • Update CLIP conversion script to save a LLaVA encoder model in GGUF.
  • Load CLIP vision model with LLaVA projector in clip.cpp.
  • Update clip_image_encode function to get image hidden states from layers[-2].
  • Write a simple example for end-to-end LLaVA infrence.

I think This is enough for the initial release. I will streamline the implementation afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant