🤗 Awesome Hugging Face Models

Top Hugging Face models for NLP, vision, and audio tasks — links, descriptions, and demos included.

A curated list of the most powerful and popular models available on Hugging Face Hub, organized by task and domain. Each model includes direct links, detailed descriptions, and information about available demos.

📋 Table of Contents

🔤 Natural Language Processing

Large Language Models

Model	Description	Size	Demo
microsoft/DialoGPT-large	Conversational AI model fine-tuned on Reddit conversations	345M	🚀 Demo
EleutherAI/gpt-j-6B	Open-source GPT-style autoregressive language model	6B	🚀 Demo
bigscience/bloom-7b1	Multilingual autoregressive language model trained on 46 languages	7.1B	🚀 Demo
meta-llama/Llama-2-7b-chat-hf	Fine-tuned Llama 2 model optimized for dialogue use cases	7B	🚀 Demo
mistralai/Mixtral-8x7B-Instruct-v0.1	Mixture of experts model with excellent instruction following	46.7B	🚀 Demo

Text Classification

Model	Description	Use Case	Demo
cardiffnlp/twitter-roberta-base-sentiment-latest	Sentiment analysis model trained on Twitter data	Social media sentiment	🚀 Demo
facebook/bart-large-mnli	Zero-shot classification using natural language inference	General classification	🚀 Demo
martin-ha/toxic-comment-model	Detects toxic comments and hate speech	Content moderation	🚀 Demo

Named Entity Recognition

Model	Description	Languages	Demo
dbmdz/bert-large-cased-finetuned-conll03-english	BERT model fine-tuned for NER on CoNLL-03	English	🚀 Demo
xlm-roberta-large-finetuned-conll03-english	Multilingual RoBERTa for cross-lingual NER	100+ languages	🚀 Demo

Question Answering

Model	Description	Type	Demo
deepset/roberta-base-squad2	RoBERTa fine-tuned on SQuAD 2.0 for extractive QA	Extractive	🚀 Demo
microsoft/DialoGPT-medium	Conversational QA with context understanding	Generative	🚀 Demo

Text Summarization

Model	Description	Type	Demo
facebook/bart-large-cnn	BART fine-tuned on CNN/DailyMail for news summarization	Abstractive	🚀 Demo
google/pegasus-xsum	PEGASUS model fine-tuned on XSum dataset	Abstractive	🚀 Demo
sshleifer/distilbart-cnn-12-6	Distilled BART model for efficient summarization	Abstractive	🚀 Demo

Translation

Model	Description	Languages	Demo
Helsinki-NLP/opus-mt-en-de	English to German translation	EN → DE	🚀 Demo
facebook/m2m100_418M	Multilingual machine translation without English pivoting	100 languages	🚀 Demo
google/mt5-large	Multilingual T5 for various text-to-text tasks	101 languages	🚀 Demo

Code Generation

Model	Description	Languages	Demo
Salesforce/codegen-350M-mono	Code generation model trained on Python	Python	🚀 Demo
microsoft/CodeBERT-base	Pre-trained model for programming languages	Multiple	🚀 Demo
codeparrot/codeparrot	GPT-2 model fine-tuned on Python code	Python	🚀 Demo

🖼️ Computer Vision

Image Classification

Model	Description	Dataset	Demo
google/vit-base-patch16-224	Vision Transformer for image classification	ImageNet-21k	🚀 Demo
microsoft/resnet-50	ResNet-50 model for general image classification	ImageNet-1k	🚀 Demo
google/efficientnet-b7	EfficientNet model with excellent accuracy/efficiency trade-off	ImageNet	🚀 Demo

Object Detection

Model	Description	Framework	Demo
facebook/detr-resnet-50	End-to-end object detection with Transformers	PyTorch	🚀 Demo
hustvl/yolos-tiny	You Only Look at One Sequence for object detection	PyTorch	🚀 Demo

Image Segmentation

Model	Description	Type	Demo
facebook/detr-resnet-50-panoptic	DETR model for panoptic segmentation	Panoptic	🚀 Demo
nvidia/segformer-b5-finetuned-ade-640-640	SegFormer for semantic segmentation	Semantic	🚀 Demo

Image Generation

Model	Description	Type	Demo
runwayml/stable-diffusion-v1-5	Stable Diffusion model for text-to-image generation	Text-to-Image	🚀 Demo
CompVis/stable-diffusion-v1-4	Earlier version of Stable Diffusion	Text-to-Image	🚀 Demo
prompthero/openjourney	Fine-tuned Stable Diffusion with artistic style	Text-to-Image	🚀 Demo

Optical Character Recognition

Model	Description	Languages	Demo
microsoft/trocr-base-printed	Transformer-based OCR for printed text	English	🚀 Demo
microsoft/trocr-large-handwritten	OCR model specifically for handwritten text	English	🚀 Demo

🎵 Audio Processing

Speech Recognition

Model	Description	Languages	Demo
openai/whisper-large-v2	State-of-the-art speech recognition and translation	99 languages	🚀 Demo
facebook/wav2vec2-large-960h-lv60-self	Self-supervised speech recognition model	English	🚀 Demo
jonatasgrosman/wav2vec2-large-xlsr-53-english	Cross-lingual speech recognition model	English	🚀 Demo

Text-to-Speech

Model	Description	Quality	Demo
microsoft/speecht5_tts	SpeechT5 model for text-to-speech synthesis	High	🚀 Demo
espnet/kan-bayashi_ljspeech_vits	VITS model for high-quality speech synthesis	High	🚀 Demo

Audio Classification

Model	Description	Classes	Demo
MIT/ast-finetuned-audioset-10-10-0.4593	Audio Spectrogram Transformer for sound classification	527 classes	🚀 Demo
superb/wav2vec2-base-superb-er	Emotion recognition from speech	Emotions	🚀 Demo

Music Generation

Model	Description	Type	Demo
facebook/musicgen-small	MusicGen model for controllable music generation	Conditional	🚀 Demo
facebook/musicgen-melody	MusicGen with melody conditioning capabilities	Melody-conditioned	🚀 Demo

🔀 Multimodal

Vision-Language

Model	Description	Capabilities	Demo
Salesforce/blip-image-captioning-large	BLIP model for image captioning	Image → Text	🚀 Demo
Salesforce/blip-vqa-base	Visual question answering with BLIP	VQA	🚀 Demo
microsoft/git-large-coco	Generative Image-to-text Transformer	Image captioning	🚀 Demo
openai/clip-vit-large-patch14	CLIP model for image-text similarity	Zero-shot classification	🚀 Demo

Audio-Text

Model	Description	Capabilities	Demo
laion/clap-htsat-unfused	Contrastive Language-Audio Pre-training	Audio-text similarity	🚀 Demo

🏆 Featured Collections

🌟 Most Popular Models

The top models by downloads and community adoption:

bert-base-uncased - The foundational BERT model
gpt2 - OpenAI's GPT-2 model
distilbert-base-uncased - Distilled BERT for efficiency
openai/clip-vit-base-patch32 - CLIP for vision-language tasks

🚀 State-of-the-Art Models

Cutting-edge models pushing the boundaries:

openai/whisper-large-v2 - Best speech recognition
runwayml/stable-diffusion-v1-5 - Leading image generation
facebook/musicgen-large - Advanced music generation
mistralai/Mixtral-8x7B-Instruct-v0.1 - Top instruction-following LLM

📊 Model Comparison

Large Language Models Comparison

Model	Parameters	Open Source	Commercial Use	Best For
GPT-J 6B	6B	✅	✅	General text generation
BLOOM 7B	7.1B	✅	✅	Multilingual tasks
Llama 2 Chat 7B	7B	✅	✅	Conversational AI
Mixtral 8x7B	46.7B	✅	✅	Complex reasoning

Vision Models Comparison

Model	Type	Accuracy (ImageNet)	Parameters	Best For
ResNet-50	CNN	76.1%	25.6M	General classification
ViT-Base	Transformer	81.8%	86M	Image understanding
EfficientNet-B7	CNN	84.4%	66M	Efficient classification
DETR	Transformer	-	41M	Object detection

🛠️ Usage Examples

Quick Start with Transformers

# Text Classification
from transformers import pipeline
classifier = pipeline("sentiment-analysis", 
                     model="cardiffnlp/twitter-roberta-base-sentiment-latest")
result = classifier("I love this awesome repository!")

# Image Classification  
classifier = pipeline("image-classification", 
                     model="google/vit-base-patch16-224")
result = classifier("path/to/image.jpg")

# Speech Recognition
transcriber = pipeline("automatic-speech-recognition",
                      model="openai/whisper-large-v2")
result = transcriber("path/to/audio.wav")

Using Specific Models

# Question Answering
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")

# Image Captioning
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

🔍 How to Choose a Model

For NLP Tasks

Text Classification: Start with RoBERTa-based models for accuracy
Generation: Use GPT-style models (GPT-J, BLOOM, Llama 2)
Question Answering: BERT/RoBERTa fine-tuned on SQuAD
Multilingual: XLM-R, mBERT, or BLOOM for cross-lingual tasks

For Computer Vision

Image Classification: ViT for state-of-the-art, ResNet for efficiency
Object Detection: DETR for end-to-end, YOLO for speed
Image Generation: Stable Diffusion for quality and control

For Audio

Speech Recognition: Whisper for multilingual, Wav2Vec2 for English
TTS: SpeechT5 for quality, Tacotron for customization
Audio Classification: AST for general audio understanding

📈 Performance Benchmarks

NLP Benchmarks

GLUE: General Language Understanding benchmark
SuperGLUE: More challenging language understanding tasks
SQuAD: Reading comprehension benchmark
HellaSwag: Commonsense reasoning

Vision Benchmarks

ImageNet: Image classification accuracy
COCO: Object detection and segmentation
Open Images: Large-scale object detection

Audio Benchmarks

LibriSpeech: Speech recognition accuracy
CommonVoice: Multilingual speech recognition
AudioSet: Audio classification tasks

🎯 Use Case Recommendations

Content Creation

Text: GPT-J 6B, BLOOM 7B
Images: Stable Diffusion v1.5
Music: MusicGen Large

Business Applications

Customer Support: DialoGPT, RoBERTa sentiment analysis
Document Processing: LayoutLM, TrOCR
Search & Retrieval: CLIP, Sentence Transformers

Research & Development

Multimodal: CLIP, BLIP, ALIGN
Code: CodeT5, CodeBERT, CodeGen
Scientific: SciBERT, BioBERT, DistilRoBERTa

🤝 Contributing

We welcome contributions! Here's how you can help:

Add new models: Submit a PR with model details following our format
Update benchmarks: Share performance results and comparisons
Fix issues: Report broken links or outdated information
Improve descriptions: Make model descriptions more accurate and helpful

Contribution Guidelines

Use the existing format for consistency
Include direct Hugging Face Hub links
Verify that demos work before submitting
Add appropriate tags and categories
Keep descriptions concise but informative

Model Criteria

To be included, models should be:

Publicly available on Hugging Face Hub
Well-documented with clear use cases
Actively maintained or widely adopted
Demonstrate strong performance on benchmarks

📝 License

This repository is licensed under the Creative Commons Attribution 4.0 International Public License. See LICENSE for details.

⭐ Star this repository if you find it helpful!

📫 Questions or suggestions? Open an issue or start a discussion.

🔗 Share with others who might benefit from this curated list.

Last updated: August 2025

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MODELS.md		MODELS.md
README.md		README.md

License

JehoshuaM/awesome-huggingface-models

Folders and files

Latest commit

History

Repository files navigation