Top Hugging Face models for NLP, vision, and audio tasks — links, descriptions, and demos included.
A curated list of the most powerful and popular models available on Hugging Face Hub, organized by task and domain. Each model includes direct links, detailed descriptions, and information about available demos.
| Model | Description | Size | Demo |
|---|---|---|---|
| microsoft/DialoGPT-large | Conversational AI model fine-tuned on Reddit conversations | 345M | 🚀 Demo |
| EleutherAI/gpt-j-6B | Open-source GPT-style autoregressive language model | 6B | 🚀 Demo |
| bigscience/bloom-7b1 | Multilingual autoregressive language model trained on 46 languages | 7.1B | 🚀 Demo |
| meta-llama/Llama-2-7b-chat-hf | Fine-tuned Llama 2 model optimized for dialogue use cases | 7B | 🚀 Demo |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | Mixture of experts model with excellent instruction following | 46.7B | 🚀 Demo |
| Model | Description | Use Case | Demo |
|---|---|---|---|
| cardiffnlp/twitter-roberta-base-sentiment-latest | Sentiment analysis model trained on Twitter data | Social media sentiment | 🚀 Demo |
| facebook/bart-large-mnli | Zero-shot classification using natural language inference | General classification | 🚀 Demo |
| martin-ha/toxic-comment-model | Detects toxic comments and hate speech | Content moderation | 🚀 Demo |
| Model | Description | Languages | Demo |
|---|---|---|---|
| dbmdz/bert-large-cased-finetuned-conll03-english | BERT model fine-tuned for NER on CoNLL-03 | English | 🚀 Demo |
| xlm-roberta-large-finetuned-conll03-english | Multilingual RoBERTa for cross-lingual NER | 100+ languages | 🚀 Demo |
| Model | Description | Type | Demo |
|---|---|---|---|
| deepset/roberta-base-squad2 | RoBERTa fine-tuned on SQuAD 2.0 for extractive QA | Extractive | 🚀 Demo |
| microsoft/DialoGPT-medium | Conversational QA with context understanding | Generative | 🚀 Demo |
| Model | Description | Type | Demo |
|---|---|---|---|
| facebook/bart-large-cnn | BART fine-tuned on CNN/DailyMail for news summarization | Abstractive | 🚀 Demo |
| google/pegasus-xsum | PEGASUS model fine-tuned on XSum dataset | Abstractive | 🚀 Demo |
| sshleifer/distilbart-cnn-12-6 | Distilled BART model for efficient summarization | Abstractive | 🚀 Demo |
| Model | Description | Languages | Demo |
|---|---|---|---|
| Helsinki-NLP/opus-mt-en-de | English to German translation | EN → DE | 🚀 Demo |
| facebook/m2m100_418M | Multilingual machine translation without English pivoting | 100 languages | 🚀 Demo |
| google/mt5-large | Multilingual T5 for various text-to-text tasks | 101 languages | 🚀 Demo |
| Model | Description | Languages | Demo |
|---|---|---|---|
| Salesforce/codegen-350M-mono | Code generation model trained on Python | Python | 🚀 Demo |
| microsoft/CodeBERT-base | Pre-trained model for programming languages | Multiple | 🚀 Demo |
| codeparrot/codeparrot | GPT-2 model fine-tuned on Python code | Python | 🚀 Demo |
| Model | Description | Dataset | Demo |
|---|---|---|---|
| google/vit-base-patch16-224 | Vision Transformer for image classification | ImageNet-21k | 🚀 Demo |
| microsoft/resnet-50 | ResNet-50 model for general image classification | ImageNet-1k | 🚀 Demo |
| google/efficientnet-b7 | EfficientNet model with excellent accuracy/efficiency trade-off | ImageNet | 🚀 Demo |
| Model | Description | Framework | Demo |
|---|---|---|---|
| facebook/detr-resnet-50 | End-to-end object detection with Transformers | PyTorch | 🚀 Demo |
| hustvl/yolos-tiny | You Only Look at One Sequence for object detection | PyTorch | 🚀 Demo |
| Model | Description | Type | Demo |
|---|---|---|---|
| facebook/detr-resnet-50-panoptic | DETR model for panoptic segmentation | Panoptic | 🚀 Demo |
| nvidia/segformer-b5-finetuned-ade-640-640 | SegFormer for semantic segmentation | Semantic | 🚀 Demo |
| Model | Description | Type | Demo |
|---|---|---|---|
| runwayml/stable-diffusion-v1-5 | Stable Diffusion model for text-to-image generation | Text-to-Image | 🚀 Demo |
| CompVis/stable-diffusion-v1-4 | Earlier version of Stable Diffusion | Text-to-Image | 🚀 Demo |
| prompthero/openjourney | Fine-tuned Stable Diffusion with artistic style | Text-to-Image | 🚀 Demo |
| Model | Description | Languages | Demo |
|---|---|---|---|
| microsoft/trocr-base-printed | Transformer-based OCR for printed text | English | 🚀 Demo |
| microsoft/trocr-large-handwritten | OCR model specifically for handwritten text | English | 🚀 Demo |
| Model | Description | Languages | Demo |
|---|---|---|---|
| openai/whisper-large-v2 | State-of-the-art speech recognition and translation | 99 languages | 🚀 Demo |
| facebook/wav2vec2-large-960h-lv60-self | Self-supervised speech recognition model | English | 🚀 Demo |
| jonatasgrosman/wav2vec2-large-xlsr-53-english | Cross-lingual speech recognition model | English | 🚀 Demo |
| Model | Description | Quality | Demo |
|---|---|---|---|
| microsoft/speecht5_tts | SpeechT5 model for text-to-speech synthesis | High | 🚀 Demo |
| espnet/kan-bayashi_ljspeech_vits | VITS model for high-quality speech synthesis | High | 🚀 Demo |
| Model | Description | Classes | Demo |
|---|---|---|---|
| MIT/ast-finetuned-audioset-10-10-0.4593 | Audio Spectrogram Transformer for sound classification | 527 classes | 🚀 Demo |
| superb/wav2vec2-base-superb-er | Emotion recognition from speech | Emotions | 🚀 Demo |
| Model | Description | Type | Demo |
|---|---|---|---|
| facebook/musicgen-small | MusicGen model for controllable music generation | Conditional | 🚀 Demo |
| facebook/musicgen-melody | MusicGen with melody conditioning capabilities | Melody-conditioned | 🚀 Demo |
| Model | Description | Capabilities | Demo |
|---|---|---|---|
| Salesforce/blip-image-captioning-large | BLIP model for image captioning | Image → Text | 🚀 Demo |
| Salesforce/blip-vqa-base | Visual question answering with BLIP | VQA | 🚀 Demo |
| microsoft/git-large-coco | Generative Image-to-text Transformer | Image captioning | 🚀 Demo |
| openai/clip-vit-large-patch14 | CLIP model for image-text similarity | Zero-shot classification | 🚀 Demo |
| Model | Description | Capabilities | Demo |
|---|---|---|---|
| laion/clap-htsat-unfused | Contrastive Language-Audio Pre-training | Audio-text similarity | 🚀 Demo |
The top models by downloads and community adoption:
- bert-base-uncased - The foundational BERT model
- gpt2 - OpenAI's GPT-2 model
- distilbert-base-uncased - Distilled BERT for efficiency
- openai/clip-vit-base-patch32 - CLIP for vision-language tasks
Cutting-edge models pushing the boundaries:
- openai/whisper-large-v2 - Best speech recognition
- runwayml/stable-diffusion-v1-5 - Leading image generation
- facebook/musicgen-large - Advanced music generation
- mistralai/Mixtral-8x7B-Instruct-v0.1 - Top instruction-following LLM
| Model | Parameters | Open Source | Commercial Use | Best For |
|---|---|---|---|---|
| GPT-J 6B | 6B | ✅ | ✅ | General text generation |
| BLOOM 7B | 7.1B | ✅ | ✅ | Multilingual tasks |
| Llama 2 Chat 7B | 7B | ✅ | ✅ | Conversational AI |
| Mixtral 8x7B | 46.7B | ✅ | ✅ | Complex reasoning |
| Model | Type | Accuracy (ImageNet) | Parameters | Best For |
|---|---|---|---|---|
| ResNet-50 | CNN | 76.1% | 25.6M | General classification |
| ViT-Base | Transformer | 81.8% | 86M | Image understanding |
| EfficientNet-B7 | CNN | 84.4% | 66M | Efficient classification |
| DETR | Transformer | - | 41M | Object detection |
# Text Classification
from transformers import pipeline
classifier = pipeline("sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment-latest")
result = classifier("I love this awesome repository!")
# Image Classification
classifier = pipeline("image-classification",
model="google/vit-base-patch16-224")
result = classifier("path/to/image.jpg")
# Speech Recognition
transcriber = pipeline("automatic-speech-recognition",
model="openai/whisper-large-v2")
result = transcriber("path/to/audio.wav")# Question Answering
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
# Image Captioning
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")- Text Classification: Start with RoBERTa-based models for accuracy
- Generation: Use GPT-style models (GPT-J, BLOOM, Llama 2)
- Question Answering: BERT/RoBERTa fine-tuned on SQuAD
- Multilingual: XLM-R, mBERT, or BLOOM for cross-lingual tasks
- Image Classification: ViT for state-of-the-art, ResNet for efficiency
- Object Detection: DETR for end-to-end, YOLO for speed
- Image Generation: Stable Diffusion for quality and control
- Speech Recognition: Whisper for multilingual, Wav2Vec2 for English
- TTS: SpeechT5 for quality, Tacotron for customization
- Audio Classification: AST for general audio understanding
- GLUE: General Language Understanding benchmark
- SuperGLUE: More challenging language understanding tasks
- SQuAD: Reading comprehension benchmark
- HellaSwag: Commonsense reasoning
- ImageNet: Image classification accuracy
- COCO: Object detection and segmentation
- Open Images: Large-scale object detection
- LibriSpeech: Speech recognition accuracy
- CommonVoice: Multilingual speech recognition
- AudioSet: Audio classification tasks
- Text: GPT-J 6B, BLOOM 7B
- Images: Stable Diffusion v1.5
- Music: MusicGen Large
- Customer Support: DialoGPT, RoBERTa sentiment analysis
- Document Processing: LayoutLM, TrOCR
- Search & Retrieval: CLIP, Sentence Transformers
- Multimodal: CLIP, BLIP, ALIGN
- Code: CodeT5, CodeBERT, CodeGen
- Scientific: SciBERT, BioBERT, DistilRoBERTa
We welcome contributions! Here's how you can help:
- Add new models: Submit a PR with model details following our format
- Update benchmarks: Share performance results and comparisons
- Fix issues: Report broken links or outdated information
- Improve descriptions: Make model descriptions more accurate and helpful
- Use the existing format for consistency
- Include direct Hugging Face Hub links
- Verify that demos work before submitting
- Add appropriate tags and categories
- Keep descriptions concise but informative
To be included, models should be:
- Publicly available on Hugging Face Hub
- Well-documented with clear use cases
- Actively maintained or widely adopted
- Demonstrate strong performance on benchmarks
This repository is licensed under the Creative Commons Attribution 4.0 International Public License. See LICENSE for details.
© Jehoshua 2025
⭐ Star this repository if you find it helpful!
📫 Questions or suggestions? Open an issue or start a discussion.
🔗 Share with others who might benefit from this curated list.
Last updated: August 2025