KEMBAR78
GitHub - JehoshuaM/awesome-huggingface-models: Top Hugging Face models for NLP, vision, and audio tasks — links, descriptions, and demos included.
Skip to content

JehoshuaM/awesome-huggingface-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤗 Awesome Hugging Face Models

Top Hugging Face models for NLP, vision, and audio tasks — links, descriptions, and demos included.

Awesome GitHub stars

A curated list of the most powerful and popular models available on Hugging Face Hub, organized by task and domain. Each model includes direct links, detailed descriptions, and information about available demos.

📋 Table of Contents

🔤 Natural Language Processing

Large Language Models

Model Description Size Demo
microsoft/DialoGPT-large Conversational AI model fine-tuned on Reddit conversations 345M 🚀 Demo
EleutherAI/gpt-j-6B Open-source GPT-style autoregressive language model 6B 🚀 Demo
bigscience/bloom-7b1 Multilingual autoregressive language model trained on 46 languages 7.1B 🚀 Demo
meta-llama/Llama-2-7b-chat-hf Fine-tuned Llama 2 model optimized for dialogue use cases 7B 🚀 Demo
mistralai/Mixtral-8x7B-Instruct-v0.1 Mixture of experts model with excellent instruction following 46.7B 🚀 Demo

Text Classification

Model Description Use Case Demo
cardiffnlp/twitter-roberta-base-sentiment-latest Sentiment analysis model trained on Twitter data Social media sentiment 🚀 Demo
facebook/bart-large-mnli Zero-shot classification using natural language inference General classification 🚀 Demo
martin-ha/toxic-comment-model Detects toxic comments and hate speech Content moderation 🚀 Demo

Named Entity Recognition

Model Description Languages Demo
dbmdz/bert-large-cased-finetuned-conll03-english BERT model fine-tuned for NER on CoNLL-03 English 🚀 Demo
xlm-roberta-large-finetuned-conll03-english Multilingual RoBERTa for cross-lingual NER 100+ languages 🚀 Demo

Question Answering

Model Description Type Demo
deepset/roberta-base-squad2 RoBERTa fine-tuned on SQuAD 2.0 for extractive QA Extractive 🚀 Demo
microsoft/DialoGPT-medium Conversational QA with context understanding Generative 🚀 Demo

Text Summarization

Model Description Type Demo
facebook/bart-large-cnn BART fine-tuned on CNN/DailyMail for news summarization Abstractive 🚀 Demo
google/pegasus-xsum PEGASUS model fine-tuned on XSum dataset Abstractive 🚀 Demo
sshleifer/distilbart-cnn-12-6 Distilled BART model for efficient summarization Abstractive 🚀 Demo

Translation

Model Description Languages Demo
Helsinki-NLP/opus-mt-en-de English to German translation EN → DE 🚀 Demo
facebook/m2m100_418M Multilingual machine translation without English pivoting 100 languages 🚀 Demo
google/mt5-large Multilingual T5 for various text-to-text tasks 101 languages 🚀 Demo

Code Generation

Model Description Languages Demo
Salesforce/codegen-350M-mono Code generation model trained on Python Python 🚀 Demo
microsoft/CodeBERT-base Pre-trained model for programming languages Multiple 🚀 Demo
codeparrot/codeparrot GPT-2 model fine-tuned on Python code Python 🚀 Demo

🖼️ Computer Vision

Image Classification

Model Description Dataset Demo
google/vit-base-patch16-224 Vision Transformer for image classification ImageNet-21k 🚀 Demo
microsoft/resnet-50 ResNet-50 model for general image classification ImageNet-1k 🚀 Demo
google/efficientnet-b7 EfficientNet model with excellent accuracy/efficiency trade-off ImageNet 🚀 Demo

Object Detection

Model Description Framework Demo
facebook/detr-resnet-50 End-to-end object detection with Transformers PyTorch 🚀 Demo
hustvl/yolos-tiny You Only Look at One Sequence for object detection PyTorch 🚀 Demo

Image Segmentation

Model Description Type Demo
facebook/detr-resnet-50-panoptic DETR model for panoptic segmentation Panoptic 🚀 Demo
nvidia/segformer-b5-finetuned-ade-640-640 SegFormer for semantic segmentation Semantic 🚀 Demo

Image Generation

Model Description Type Demo
runwayml/stable-diffusion-v1-5 Stable Diffusion model for text-to-image generation Text-to-Image 🚀 Demo
CompVis/stable-diffusion-v1-4 Earlier version of Stable Diffusion Text-to-Image 🚀 Demo
prompthero/openjourney Fine-tuned Stable Diffusion with artistic style Text-to-Image 🚀 Demo

Optical Character Recognition

Model Description Languages Demo
microsoft/trocr-base-printed Transformer-based OCR for printed text English 🚀 Demo
microsoft/trocr-large-handwritten OCR model specifically for handwritten text English 🚀 Demo

🎵 Audio Processing

Speech Recognition

Model Description Languages Demo
openai/whisper-large-v2 State-of-the-art speech recognition and translation 99 languages 🚀 Demo
facebook/wav2vec2-large-960h-lv60-self Self-supervised speech recognition model English 🚀 Demo
jonatasgrosman/wav2vec2-large-xlsr-53-english Cross-lingual speech recognition model English 🚀 Demo

Text-to-Speech

Model Description Quality Demo
microsoft/speecht5_tts SpeechT5 model for text-to-speech synthesis High 🚀 Demo
espnet/kan-bayashi_ljspeech_vits VITS model for high-quality speech synthesis High 🚀 Demo

Audio Classification

Model Description Classes Demo
MIT/ast-finetuned-audioset-10-10-0.4593 Audio Spectrogram Transformer for sound classification 527 classes 🚀 Demo
superb/wav2vec2-base-superb-er Emotion recognition from speech Emotions 🚀 Demo

Music Generation

Model Description Type Demo
facebook/musicgen-small MusicGen model for controllable music generation Conditional 🚀 Demo
facebook/musicgen-melody MusicGen with melody conditioning capabilities Melody-conditioned 🚀 Demo

🔀 Multimodal

Vision-Language

Model Description Capabilities Demo
Salesforce/blip-image-captioning-large BLIP model for image captioning Image → Text 🚀 Demo
Salesforce/blip-vqa-base Visual question answering with BLIP VQA 🚀 Demo
microsoft/git-large-coco Generative Image-to-text Transformer Image captioning 🚀 Demo
openai/clip-vit-large-patch14 CLIP model for image-text similarity Zero-shot classification 🚀 Demo

Audio-Text

Model Description Capabilities Demo
laion/clap-htsat-unfused Contrastive Language-Audio Pre-training Audio-text similarity 🚀 Demo

🏆 Featured Collections

🌟 Most Popular Models

The top models by downloads and community adoption:

🚀 State-of-the-Art Models

Cutting-edge models pushing the boundaries:

📊 Model Comparison

Large Language Models Comparison

Model Parameters Open Source Commercial Use Best For
GPT-J 6B 6B General text generation
BLOOM 7B 7.1B Multilingual tasks
Llama 2 Chat 7B 7B Conversational AI
Mixtral 8x7B 46.7B Complex reasoning

Vision Models Comparison

Model Type Accuracy (ImageNet) Parameters Best For
ResNet-50 CNN 76.1% 25.6M General classification
ViT-Base Transformer 81.8% 86M Image understanding
EfficientNet-B7 CNN 84.4% 66M Efficient classification
DETR Transformer - 41M Object detection

🛠️ Usage Examples

Quick Start with Transformers

# Text Classification
from transformers import pipeline
classifier = pipeline("sentiment-analysis", 
                     model="cardiffnlp/twitter-roberta-base-sentiment-latest")
result = classifier("I love this awesome repository!")

# Image Classification  
classifier = pipeline("image-classification", 
                     model="google/vit-base-patch16-224")
result = classifier("path/to/image.jpg")

# Speech Recognition
transcriber = pipeline("automatic-speech-recognition",
                      model="openai/whisper-large-v2")
result = transcriber("path/to/audio.wav")

Using Specific Models

# Question Answering
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")

# Image Captioning
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

🔍 How to Choose a Model

For NLP Tasks

  • Text Classification: Start with RoBERTa-based models for accuracy
  • Generation: Use GPT-style models (GPT-J, BLOOM, Llama 2)
  • Question Answering: BERT/RoBERTa fine-tuned on SQuAD
  • Multilingual: XLM-R, mBERT, or BLOOM for cross-lingual tasks

For Computer Vision

  • Image Classification: ViT for state-of-the-art, ResNet for efficiency
  • Object Detection: DETR for end-to-end, YOLO for speed
  • Image Generation: Stable Diffusion for quality and control

For Audio

  • Speech Recognition: Whisper for multilingual, Wav2Vec2 for English
  • TTS: SpeechT5 for quality, Tacotron for customization
  • Audio Classification: AST for general audio understanding

📈 Performance Benchmarks

NLP Benchmarks

  • GLUE: General Language Understanding benchmark
  • SuperGLUE: More challenging language understanding tasks
  • SQuAD: Reading comprehension benchmark
  • HellaSwag: Commonsense reasoning

Vision Benchmarks

  • ImageNet: Image classification accuracy
  • COCO: Object detection and segmentation
  • Open Images: Large-scale object detection

Audio Benchmarks

  • LibriSpeech: Speech recognition accuracy
  • CommonVoice: Multilingual speech recognition
  • AudioSet: Audio classification tasks

🎯 Use Case Recommendations

Content Creation

  • Text: GPT-J 6B, BLOOM 7B
  • Images: Stable Diffusion v1.5
  • Music: MusicGen Large

Business Applications

  • Customer Support: DialoGPT, RoBERTa sentiment analysis
  • Document Processing: LayoutLM, TrOCR
  • Search & Retrieval: CLIP, Sentence Transformers

Research & Development

  • Multimodal: CLIP, BLIP, ALIGN
  • Code: CodeT5, CodeBERT, CodeGen
  • Scientific: SciBERT, BioBERT, DistilRoBERTa

🤝 Contributing

We welcome contributions! Here's how you can help:

  1. Add new models: Submit a PR with model details following our format
  2. Update benchmarks: Share performance results and comparisons
  3. Fix issues: Report broken links or outdated information
  4. Improve descriptions: Make model descriptions more accurate and helpful

Contribution Guidelines

  • Use the existing format for consistency
  • Include direct Hugging Face Hub links
  • Verify that demos work before submitting
  • Add appropriate tags and categories
  • Keep descriptions concise but informative

Model Criteria

To be included, models should be:

  • Publicly available on Hugging Face Hub
  • Well-documented with clear use cases
  • Actively maintained or widely adopted
  • Demonstrate strong performance on benchmarks

📝 License

This repository is licensed under the Creative Commons Attribution 4.0 International Public License. See LICENSE for details.

© Jehoshua 2025


Star this repository if you find it helpful!

📫 Questions or suggestions? Open an issue or start a discussion.

🔗 Share with others who might benefit from this curated list.


Last updated: August 2025

About

Top Hugging Face models for NLP, vision, and audio tasks — links, descriptions, and demos included.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published