KEMBAR78
[Mistral] Mistral-7B-v0.1 support by Bam4d · Pull Request #1196 · vllm-project/vllm · GitHub
Skip to content

Conversation

@Bam4d
Copy link
Contributor

@Bam4d Bam4d commented Sep 27, 2023

No description provided.


import torch
from torch import nn
from transformers import MistralConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work because MistralConfig is not a regular model in HF transformers at the moment (v4.33.3). Could you define this config class just like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timlacroix Besides this, it seems everything works fine!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok addressed. Will we need to change this back after the next release ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timlacroix Yes. Once a new version of HF transformers is released, we will remove it.

@WoosukKwon WoosukKwon mentioned this pull request Sep 27, 2023
5 tasks
@casper-hansen
Copy link
Contributor

casper-hansen commented Sep 27, 2023

The Mistral model is almost equivalent to llama in terms of quantizing the model, it would be super easy to extend support as I have already added Mistral in AutoAWQ. If you can modify this part below, you will enable AWQ quantized models:

_MODEL_CLASSES_SUPPORT_QUANTIZATION = [
    LlamaForCausalLM,
    MistralForCausalLM,
]

After that, you should be able to run inference with the quantized model that is already available: https://huggingface.co/casperhansen/mistral-7b-instruct-v0.1-awq

from vllm import LLM, SamplingParams

prompts = [
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="casperhansen/mistral-7b-instruct-v0.1-awq", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@WoosukKwon WoosukKwon linked an issue Sep 27, 2023 that may be closed by this pull request
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As this PR is not modifiable, I will fix some miscellaneous issues right after merging this PR.

@WoosukKwon WoosukKwon mentioned this pull request Sep 28, 2023
@WoosukKwon WoosukKwon merged commit bb1ba58 into vllm-project:main Sep 28, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Co-authored-by: timlacroix <t@mistral.ai>
minmin-intel pushed a commit to minmin-intel/vllm that referenced this pull request Jul 15, 2025
Reviewed Gaudi README.

---------

Co-authored-by: PatrykWo <pwolsza@habana.ai>
Co-authored-by: PatW <patryk.wolsza@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Mistral 7B

4 participants