0% found this document useful (0 votes)

19 views25 pages

ImageGenerator Project Report (Tech-Titans)

The document presents a minor project report on 'Text-To-Image-Generator Using Stable Diffusion Model' submitted by students of Baba Ghulam Shah Badshah University for their B.Tech degree in Computer Science and Engineering. It details the project's aim to develop an AI system that generates high-quality images from text descriptions using the Stable Diffusion model, emphasizing its efficiency and accessibility on consumer hardware. The report includes sections on system design, literature survey, implementation, and analysis, highlighting the advancements in text-to-image generation and the integration of ethical content moderation.

Uploaded by

ritiksharma4451

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views25 pages

ImageGenerator Project Report (Tech-Titans)

Uploaded by

ritiksharma4451

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

TEXT-TO-IMAGE-GENERATOR USING STABLE

DIFFUSION MODEL

Minor Project Report

Submitted in Partial Fulfillment of the Requirement for the Award of the Degree

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE AND ENGINEERING

Submitted by

RITIK SHARMA(01-CSE-2022)
AWAIS AHMED(30-CSE-2022)
AMREEN SAYED(34-CSE-2022)
MOHD HABIB(40-CSE-2022)
Under the Guidance of
Mr. Amit Dogra

SCHOOL OF ENGINEERING & TECHNOLOGY

BABA GHULAM SHAH BADSHAH UNIVERSITY

RAJOURI (J & K) - 185234

July 2025
DECLARATION
I hereby declare that the project entitled “Text-To-Image-Generator Using Stable Diffusion
Model” submitted for the B. Tech. (CSE) degree is my original work completed under the
supervision of ”Mr. Amit Dogra”. To the best of my knowledge the project has not formed the basis for
the award of any other degree, diploma, fellowship or any other similar titles.

Place: Baba Ghulam Shah Badshah Signature of the Student

University Rajouri

Date:
CERTIFICATE

This is to certify that the project titled “Text-To-Image-Generator Using Stable Model” is the
bonafide work carried out by Ritik Sharma(01-CSE-2022), a student of B Tech (CSE) in School of
Engineering and Technology, Baba Ghulam Shah Badshah University, Rajouri, J&K, during the academic
year 2025, in partial fulfillment of the requirements for the award of the degree of Bachelor of
Technology (Computer Science and Engineering ) and to the best of my knowledge, the project has
not formed the basis for the award of any other degree.

Place:Baba Ghulam Shah University Rajouri

Date:

Signature of the Guide Signature of HoD

Signature of External Examiner

ACKNOWLEDGEMENT

I have taken efforts in this project. However, it would not have been possible without the kindsupport
and help of many individuals and organizations. I would like to extend my sincere thanksto all of them.I
am highly indebted to Mr. A m i t D o g r a (H.O.D, CSE Department) , Mr. Amit Dogra (Project Head
and Guide ) for their guidance and constant supervision as well as for providing necessary information
regarding the project & also for their support in completing the project.I would like to express my
gratitude towards my parents & member of Institute of Computer Science And Engineering for their
kind co-operation and encouragement which help me in completion of this project.I would like to
express my special gratitude and thanks to my team members who work hard to complete this project on
time with me.My thanks and appreciations also go to my colleague in developing the project and people
whohave willingly helped me out with their abilities.I have taken efforts in this project. However, it
would not have been possible without the kind support and help of many individuals . I would like to
extend my sincere thanks to all of them.
ABSTRACT
A Text-to-image generation using the Stable Diffusion model represents a significant
advancement in artificial intelligence, enabling the synthesis of high-quality, photorealistic
images from textual descriptions. Stable Diffusion, a latent diffusion model developed by
Stability AI and collaborators, leverages a computationally efficient approach by operating in the
latent space of pretrained autoencoders, rather than directly in pixel space. This innovation
reduces hardware requirements and accelerates both training and inference, making high-
fidelity image generation accessible on consumer GPUs. The proposed approach seeks to
further enhance Stable Diffusion by introducing adaptive prompt engineering, hybrid
conditioning (combining textual and visual inputs), latent space optimization, and lightweight
model variants suitable for edge computing. An ethical content moderation system is also
integrated to address biases and ensure responsible image generation. Extensive
experimentation demonstrates that Stable Diffusion achieves a strong balance between
creativity, realism, and prompt adherence, with quantitative evaluation using metrics like
Fréchet Inception Distance (FID) and CLIP Score, as well as qualitative human assessments.
The system's modular design and efficient workflow-encompassing text processing, latent
transformation, iterative denoising, and ethical filtering-make it a robust framework for diverse
creative and research applications..In summary, Stable Diffusion democratizes AI-driven
creativity by providing scalable, high-fidelity text-to-image generation, with ongoing research
focused on improving efficiency, controllability, and ethical safeguards
TABLE OF CONTENTS

Chapter Number Title Page Number

1 Introduction 1
1. Overview of the Project
2. Scope and Objective

2 Literature Survey 2-3

1. Introduction
2. Literature Survey

3 System Design 4-6

1. Natural Language Processing
2. Advantages of NLP
3. Disadvantages of NLP
4. Architecture Diagram
5. Hardware Requirement
6. Software Requirement

4 Aalysis 7-8

1. Python Library
2. Data
3. Software Description
3.1 Python

3.2 Google Colab Notebook For Python

5 Creation of Project 9-11

1. File Structure
2. Step-by-Step implementation

6 Implementation of code
12-16
6.1.Sample codes

7 Conclusion 17

8 References 18
CHAPTER 1. INTRODUCTION

1.1.OVERVIEW OF THE PROJECT

The Text-to-Image Generator Using Stable Diffusion project focuses on developing an AI-
driven system that can create high-quality, realistic images from natural language descriptions
provided by users. Utilizing the Stable Diffusion model-a cutting-edge latent diffusion
architecture-the project streamlines the image generation process by operating in a
compressed latent space, making it both computationally efficient and accessible on standard
consumer hardware. The workflow involves encoding user text prompts with a powerful
language-image model (CLIP), which guides the iterative denoising process of random noise in
the latent space to synthesize images that closely match the given descriptions. The generated
images are then decoded into high-resolution visuals using a variational autoencoder. This
project not only highlights the model’s ability to produce a wide variety of images, from simple
objects to complex scenes.

1.2. SCOPE AND OBJECTIVE

SCOPE

To create an efficient and accessible AI system that generates high-quality images from text
prompts using the Stable Diffusion model.
OBJECTIVE:
 Enable users to visualize ideas and concepts through natural language input.
 Support diverse image generation tasks, such as text-to-image, inpainting, and image editing.
 Provide a user-friendly, scalable, and open-source platform for creative and educational
applications.
CHAPTER 2. LITERATURE SURVEY

1. INTRODUCTION

Text-to-image generation is a rapidly advancing field in artificial intelligence that focuses on creating
images from natural language descriptions. This technology has the potential to revolutionize creative
industries, education, and digital content creation by enabling users to visualize their ideas and
concepts directly from text. The evolution of this field has seen a transition from early generative
models such as GANs and VAEs to more advanced diffusion models, which have significantly
improved the quality, diversity, and semantic accuracy of generated images. Among these, the Stable
Diffusion model stands out for its efficiency, scalability, and ability to generate high-fidelity images on
consumer hardware.

2. Literature Survey .
2.1 Early Generative Models.
The initial approaches to text-to-image generation were based on Generative Adversarial Networks
(GANs) and Variational Autoencoders (VAEs). GANs, such as those introduced by Goodfellow et al.
(2014), use a generator and discriminator in a competitive framework to produce realistic images.
Models like StackGAN and AttnGAN extended GANs for text-to-image tasks, but often struggled with
training instability and limited prompt fidelity. VAEs offered a probabilistic approach, generating diverse
samples but typically at the cost of lower image clarity.

2.2 Diffusion Models.

Diffusion models have recently emerged as a powerful alternative for generative tasks. These models,
including Denoising Diffusion Probabilistic Models (DDPMs), generate images by gradually denoising
random noise, resulting in stable training and high-quality outputs. Notable systems like DALL-E 2
(OpenAI), Imagen (Google), and GLIDE (OpenAI) have demonstrated the effectiveness of diffusion
models in producing photorealistic images from text prompts.

2.3 Stable Diffusion.

Stable Diffusion, introduced by Stability AI in 2022, represents a breakthrough in text-to-image
synthesis. Unlike previous models that operate directly in pixel space, Stable Diffusion utilizes a latent
diffusion approach, working in a compressed latent space to achieve computational efficiency without
sacrificing image quality. The model employs a CLIP-based text encoder to translate prompts into
embeddings that guide the diffusion process, and a variational autoencoder (VAE) to decode the final
latent representation into a high-resolution image. This architecture allows Stable Diffusion to run on
standard GPUs, making advanced generative AI accessible to a wider audience.
2.4 Recent Advancements.
Recent research has focused on improving prompt adherence, image resolution, and the accurate
rendering of complex concepts. Stable Diffusion 3, for example, incorporates a Multimodal Diffusion
Transformer (MMDiT) and Rectified Flow (RF) to enhance efficiency and prompt alignment.
Techniques such as Classifier-Free Guidance (CFG) and large-scale training on datasets like
LAION-5B have further improved the model’s versatility and output quality.

2.5 Applications and Challenges.

Text-to-image diffusion models are now widely used in creative design, advertising, gaming, and
education. However, challenges remain, including generating fine details in complex scenes,
handling ambiguous or abstract prompts, and ensuring ethical use by filtering inappropriate content.
Ongoing research aims to address these issues through improved architectures, better training data,
and integrated content moderation systems.
Current research is focused on:
•Improving model efficiency and reducing computational requirements.
•Enhancing multimodal capabilities (e.g., integrating audio or video).
•Increasing the model’s ability to interpret complex and nuanced text prompts.
•Expanding real-time image generation and interactivity.

References:
•Goodfellow, I. et al. (2014). Generative Adversarial Networks.
•Ramesh, A. et al. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents
(DALL-E 2).
•Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models (Stable
Diffusion).
•Saharia, C. et al. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding (Imagen).
This format matches your project file structure and provides a comprehensive, detailed literature
survey for your text-to-image generator project using Stable Diffusion.
Chapter 3. System Design

3.1. Natural Language Processing (NLP)

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction
between computers and human language. In this project, NLP is used to interpret and encode user-
provided text prompts, which describe the images to be generated. The system utilizes advanced
NLP models, such as CLIP (Contrastive Language-Image Pretraining), to convert textual
descriptions into high-dimensional vector representations. These vectors serve as the conditioning
input for the image generation process, ensuring that the output image aligns closely with the user’s
intent.

3.2. Advantages of Natural Language Processing.

•Semantic Understanding: NLP models can interpret complex and nuanced text, allowing
users to provide detailed and creative prompts.
•Flexibility: Users can describe a wide variety of scenes, objects, and styles in natural
language, making the system highly versatile.
•User-Friendly: No technical expertise is required; users simply type what they want to see.
•Context Awareness: Advanced NLP models can understand context, relationships, and
attributes within the prompt, resulting in more accurate image generation.

3.3. Disadvantages of NLP.

•Ambiguity: Natural language can be ambiguous or vague, sometimes leading to unexpected

or inaccurate image outputs.
•Bias: NLP models may reflect biases present in their training data, which can influence the
generated images.
•Complexity Limitations: Extremely complex or abstract prompts may not always be fully
understood or visualized by the model.
•Language Limitations: Performance may vary across different languages or dialects,
depending on the training data.
3.4. Architecture Diagram.

The architecture of the text-to-image generator using Stable

Diffusion consists of the following main components:
1.User Interface: Accepts natural language prompts from the
user.
2.Text Encoder (NLP Module): Uses a model like CLIP to
convert text prompts into embeddings.
3.Latent Diffusion Model:
1. Noise Initialization: Starts with random noise in the
latent space.
2. Denoising Network (UNet): Iteratively refines the noise,
conditioned on the text embedding.
4.Variational Autoencoder (VAE) Decoder: Converts the final
latent representation into a high-resolution image.
5.Output Module: Displays the generated image to the user.
3.5. Hardware Requirement.

•Processor: Intel i5 or higher / AMD equivalent

•RAM: Minimum 8 GB (16 GB recommended for faster processing)
•GPU: NVIDIA GPU with at least 4 GB VRAM (e.g., GTX 1650 or
better) for efficient image generation; CPU-only operation is possible
but slower
•Storage: Minimum 10 GB free disk space for model weights and
dependencies

3.6. Software Requirements.

•Operating System: Windows 10/11, Linux, or macOS

•Programming Language: Python 3.8 or above
•Libraries/Frameworks:
• PyTorch or TensorFlow (for deep learning)
• Hugging Face Transformers (for NLP models)
• diffusers (for Stable Diffusion implementation)
• OpenCV, PIL (for image processing)
• Flask or Streamlit (for web interface, if applicable)
•Other Tools: CUDA drivers (for GPU acceleration), Git (for version
control), and Jupyter Notebook (for experimentation)

Conclusion-

In this chapter ,we have discussed introduction to NLP, software requirement,and

hardware requirement of the chatbot project and in next chapter ,we discuss about
various libraries used for creation of project in Python.
CHAPTER 4. ANALYSIS

1. PYTHON LIBRARY

A Python library is a collection of related modules. It contains bundles of code that

can be used repeatedly in different programs. It makes Python Programming simpler
and convenient for the programmer. For implementing a text-to-image generator
using the Stable Diffusion model, several Python libraries are essential:
•diffusers: The primary library for accessing and running Stable Diffusion
models, offering easy-to-use pipelines for text-to-image generation.
•transformers: Provides pre-trained models and tokenizers, especially useful for
handling text encoding with CLIP or similar models.
•torch (PyTorch): The core deep learning framework used for model inference,
tensor operations, and GPU acceleration.
•Pillow (PIL): For image processing tasks such as saving, displaying, or
converting generated images.
•requests: Useful for API calls, especially if interacting with cloud-based or
hosted Stable Diffusion services.
•IPython.display: For displaying images within Jupyter notebooks or interactive
Python environments.
•os, json, time: Standard Python libraries for file handling, configuration, and
time management during batch processing or automation.

2. Data

The data component in this project refers primarily to:

•Text Prompts: User-provided natural language descriptions that guide the
image generation process.
•Model Weights: Pre-trained Stable Diffusion model files, typically several
gigabytes in size, which contain the learned parameters for generating images.
•Generated Images: Output files (usually in PNG or JPEG format) created by
the model from the given text prompts.
•Optional Datasets: For advanced use cases (like fine-tuning or evaluation),
large datasets such as LAION-5B may be used, containing millions of image-text
pairs to enhance model performance or benchmarking.
3. Software Description

3.1 Python

Python is the primary programming language for this project due to its robust
ecosystem for AI development and ease of integration with machine learning
frameworks. Python’s flexibility allows for rapid prototyping, and its extensive
libraries support everything from data processing to deep learning model
deployment.
•Why Python?
• Industry-standard for machine learning and AI research.
• Extensive support for GPU acceleration and distributed computing.
• Active community and frequent updates for AI libraries.
Other Software Components:
•Operating System: Windows 10/11, Linux, or macOS, all compatible with Python
and required libraries
•CUDA Toolkit: For GPU acceleration if using an NVIDIA GPU.
•Jupyter Notebook: For interactive development and visualization.
•Web Frameworks (optional): Flask or Streamlit can be used to build user
interfaces for prompt input and image display

3.2 Google Colab Notebook For Python

Google Colab is an online platform that provides free access to powerful GPU
resources and a collaborative environment for running Python code, making it ideal
for implementing and experimenting with deep learning models such as Stable
Diffusion For your Text-to-Image Generator project,etc. Google Colab Notebook is
an essential tool for your Text-to-Image Generator project using Stable Diffusion. It
streamlines development, testing, and demonstration by providing a cloud-based,
GPU-enabled Python environment that supports all necessary libraries and
collaborative features.
CHAPTER 5. HOW TO CREATE A Project

5.1. FILES STRUCTURE

This project demonstrates how to generate high-quality images from text prompts
using the Stable Diffusion model and Python. The workflow leverages Hugging
Face’s Diffusers library and runs efficiently in a Google Colab or Jupyter
Notebook environment.
Its Structure Involves:
• Text_to_image_generator.ipynb # Main Jupyter Notebook (this file)
• Generated_images/ # Folder for saving generated images
• README.md # Project documentation (optional)

5.2. STEP-BY-STEP IMPLEMENTATION

Step 1: Install Required Libraries

!pip install diffusers transformers accelerate
!pip install torch torchvision torchaudio
!pip install pillow

Step 2: Import Libraries

from diffusers import StableDiffusionPipeline
import torch
from PIL import Image
import os

Step 3: Authenticate with Hugging Face (if required)

from huggingface_hub import notebook_login

notebook_login()
Follow the link, login, and paste your User Access Token if prompted .
Step 4: Load the Stable Diffusion Model
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")

Step 5: Generate an Image from a Text Prompt

prompt = "A futuristic cityscape at sunset, highly detailed, digital art"
image = pipe(prompt, num_inference_steps=50).images[0]

# Display the image

image.show()

Step 6: Save the Generated Image

output_dir = "generated_images"
os.makedirs(output_dir, exist_ok=True)
image_path = os.path.join(output_dir, "cityscape.png")
image.save(image_path)
print(f"Image saved at {image_path}")
Step 7: Experiment with More Prompts
prompts = [

"A serene mountain landscape in the style of a watercolor painting",

"A cyberpunk robot playing chess, digital art",

"A cute cat wearing a wizard hat, cartoon style"

for i, prompt in enumerate(prompts):

img = pipe(prompt, num_inference_steps=40).images[0]

img.save(os.path.join(output_dir, f"sample_{i+1}.png"))

display(img)
CHAPTER 6. IMPLEMENTION OF CODE

6.1. SAMPLE CODES.

#TextToImageGenerator.ipynb

# Install necessary libraries

%pip install --upgrade diffusers
transformers

from pathlib import Path

import tqdm
import torch
import pandas as pd
import numpy as np
from diffusers import
StableDiffusionPipeline
from transformers import pipeline,
set_seed
from torch.utils.data import Dataset
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import cv2

class CFG:
device = "cuda"
seed = 42
generator =
torch.Generator(device).manual_seed(seed)
image_gen_steps = 35
image_gen_model_id =
"stabilityai/stable-diffusion-2"
image_gen_size = (400,400)
image_gen_guidance_scale = 9
prompt_gen_model_id = "gpt2"
prompt_dataset_size = 6
prompt_max_length = 12
# Load prompts from CSV
df = pd.read_csv('prompts.csv')
df['prompt'] = df['prompt'].str.replace(r'^"|"$',
'', regex=True) # Remove surrounding quotes if
present

# Sample prompts according to config

sampled_prompts = df['prompt'].sample(
n=CFG.prompt_dataset_size,
random_state=CFG.seed
).tolist()

# Optionally truncate prompts to max length (in

tokens or words)
def truncate_prompt(prompt, max_length):
words = prompt.split()
if len(words) > max_length:
return ' '.join(words[:max_length])
return prompt

truncated_prompts = [truncate_prompt(p,
CFG.prompt_max_length) for p in sampled_prompts]

# Create a PyTorch Dataset

class PromptDataset(Dataset):
def __init__(self, prompts):
self.prompts = prompts

def __len__(self):
return len(self.prompts)

def getitem(self, idx):

return self.prompts[idx]
# Instantiate the dataset
prompt_dataset = PromptDataset(truncated_prompts)

# Example usage:
for i, prompt in enumerate(prompt_dataset):
print(f"Prompt {i+1}: {prompt}")

# === Generate and Save Images ===

os.makedirs("generated_images", exist_ok=True)

for idx, prompt in enumerate(truncated_prompts):

print(f"Generating image for prompt {idx+1}:
{prompt!r}")
with torch.autocast(CFG.device):
image = image_gen_model(
prompt,
num_inference_steps=CFG.image_gen_steps,
guidance_scale=CFG.image_gen_guidance_scal
e,
height=CFG.image_gen_size[1],
width=CFG.image_gen_size[0],
generator=CFG.generator
).images[0]
image.save(f"generated_images/image_{idx+1}.png")

print("All images generated and saved in

'generated_images/' folder.")

with open("generated_images/prompts.txt", "w") as f:

for idx, prompt in enumerate(prompt_dataset):
f.write(f"image_{idx+1}.png: {prompt}\n")

# === Load Stable Diffusion Pipeline ===

image_gen_model =
StableDiffusionPipeline.from_pretrained(
CFG.image_gen_model_id,
torch_dtype=torch.float16,
revision="fp16",
use_auth_token="your_hugging_face_auth_token" #
<-- Replace with your token
)
image_gen_model = image_gen_model.to(CFG.device)
def generate_image(prompt, model):
with torch.autocast(CFG.device):
image = model(
prompt,
num_inference_steps=CFG.image_gen_steps,
generator=CFG.generator,
guidance_scale=CFG.image_gen_guidance_scale
).images[0]
image = image.resize(CFG.image_gen_size)
return image
#Sample Prompts.
generate_image("A photorealistic image of a futuristic
city with flying cars and towering skyscrapers, with a
vivid sunset in the background", image_gen_model)
generate_image("A mystical enchanted forest with glowing
plants and magical creatures, bathed in soft dawn light",
image_gen_model)
6.1.0 OUTPUT OF TextToImageGenerator.ipynb

SAMPLE OUTPUTS

Fig 6.1.1 TextToImageGenerator.ipynb Outputs.

CONCLUSION
The Text-to-Image Generator project using the Stable Diffusion model
demonstrates a significant advancement in generative AI, enabling the creation
of high-quality, photorealistic images from natural language prompts with
remarkable efficiency and flexibility. By leveraging a latent diffusion process,
Stable Diffusion operates in a compressed latent space, which dramatically
reduces computational requirements without sacrificing image detail or semantic
alignment. The model’s architecture-featuring a variational autoencoder for
encoding and decoding, a U-Net noise predictor for progressive denoising, and
powerful text conditioning via models like CLIP-ensures that generated images
closely match the user’s descriptions while preserving intricate visual feature.
Overall, the project highlights Stable Diffusion’s ability to democratize AI-powered
creativity, offering an accessible, scalable, and versatile tool for transforming
ideas into compelling visual representations
REFERENCES

1.CompVis/stable-diffusion (GitHub):
The official repository for Stable Diffusion, providing the model code, architecture
details, and usage instructions.

2.“Visual Explanation for Text-to-image Stable Diffusion” (arXiv, 2023):

This paper introduces Diffusion Explainer, an interactive tool that helps
users understand how Stable Diffusion transforms text prompts into images,
offering insights into the model’s structure and operations.

3.“TEXT-IMAGE GENERATION-STABLE DIFFUSION” (IJSREM

Journal):
A project-focused article that explores the Stable Diffusion framework for
text-to-image synthesis, discussing its workflow and performance

4.“What’s in a text-to-image prompt? The potential of stable diffusion in

art” (ScienceDirect, 2024):
This study analyzes how prompt structure affects image generation quality and
explores the creative potential of Stable Diffusion in artistic applications.

5.Stable Diffusion 3: Research Paper (Stability AI, 2024):

The latest research paper from Stability AI detailing the advancements,
architecture, and performance of Stable Diffusion 3, including comparisons with
other state-of-the-art models.

6.Official Stable Diffusion by Stability AI (GitHub):

https://github.com/Stability-AI/stablediffusion

7.Generative Models by Stability AI (including Stable Video Diffusion):

https://github.com/Stability-AI/generative-models

Report Minor Project
No ratings yet
Report Minor Project
26 pages
Stable Diffusion With Generative Ai
No ratings yet
Stable Diffusion With Generative Ai
3 pages
Ttoimage Merged
No ratings yet
Ttoimage Merged
57 pages
Advancements in Text To Image Generation Through Generative AI
No ratings yet
Advancements in Text To Image Generation Through Generative AI
8 pages
Stable Diffusion Presentation QA
No ratings yet
Stable Diffusion Presentation QA
2 pages
AI Text-to-Image Generator Guide
No ratings yet
AI Text-to-Image Generator Guide
12 pages
IEEE Editable
No ratings yet
IEEE Editable
8 pages
Utilizing Generative AI For Text-To-Image Generation
No ratings yet
Utilizing Generative AI For Text-To-Image Generation
6 pages
Updated Poster
No ratings yet
Updated Poster
1 page
Research Paper Shailesh Tagadghar 31031523034
No ratings yet
Research Paper Shailesh Tagadghar 31031523034
16 pages
AI Poster Design for Non-Designers
No ratings yet
AI Poster Design for Non-Designers
19 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
(Nsdi24) Nirvana
No ratings yet
(Nsdi24) Nirvana
18 pages
Final All Correct
No ratings yet
Final All Correct
49 pages
AICTE Internship Project Report
No ratings yet
AICTE Internship Project Report
2 pages
Illustrating Classic Brazilian Books Using A Text-To-Image Diffusion Model
No ratings yet
Illustrating Classic Brazilian Books Using A Text-To-Image Diffusion Model
7 pages
Survey Paper On Text-to-Image Generation
No ratings yet
Survey Paper On Text-to-Image Generation
8 pages
The CLIP Model Is Secretly An Image-to-Prompt Converter
No ratings yet
The CLIP Model Is Secretly An Image-to-Prompt Converter
19 pages
Stable Diffusion Image Project
No ratings yet
Stable Diffusion Image Project
20 pages
Stable Diffusion
No ratings yet
Stable Diffusion
5 pages
Paper Math
No ratings yet
Paper Math
13 pages
Text To Image Project Presentation
No ratings yet
Text To Image Project Presentation
11 pages
Final PPT
No ratings yet
Final PPT
13 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
Capstone Project Report
No ratings yet
Capstone Project Report
26 pages
Paper 10
No ratings yet
Paper 10
8 pages
3 Paper
No ratings yet
3 Paper
14 pages
Building A System That Can Generate High
No ratings yet
Building A System That Can Generate High
2 pages
Sample Report PDF
No ratings yet
Sample Report PDF
25 pages
Project Report Mtech Ai
No ratings yet
Project Report Mtech Ai
28 pages
Background and Literature Review
No ratings yet
Background and Literature Review
7 pages
Background and Literature Review
No ratings yet
Background and Literature Review
17 pages
GPU-Optimized On-Device Diffusion Models
No ratings yet
GPU-Optimized On-Device Diffusion Models
5 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Documents 5
No ratings yet
Documents 5
5 pages
Dehouce
No ratings yet
Dehouce
12 pages
RenAIssance - A Survey Into AI Text-To-Image Generation in The Era of Large Model
No ratings yet
RenAIssance - A Survey Into AI Text-To-Image Generation in The Era of Large Model
25 pages
Diffusion Models: Challenges & Fixes
No ratings yet
Diffusion Models: Challenges & Fixes
11 pages
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
3 pages
NM Narash
No ratings yet
NM Narash
6 pages
An Adaptive Approach To Text To Image
No ratings yet
An Adaptive Approach To Text To Image
5 pages
T2I CompBench TPAMI
No ratings yet
T2I CompBench TPAMI
20 pages
T2I CompBench TPAMI
No ratings yet
T2I CompBench TPAMI
17 pages
T2I CompBench TPAMI
No ratings yet
T2I CompBench TPAMI
18 pages
Stable Diffusion
No ratings yet
Stable Diffusion
19 pages
SanjanaSademba 2205348.
No ratings yet
SanjanaSademba 2205348.
8 pages
EDiff-Text-To-Image Diffusion Models With An Ensemble of Expert Denoisers
No ratings yet
EDiff-Text-To-Image Diffusion Models With An Ensemble of Expert Denoisers
24 pages
2 PB
No ratings yet
2 PB
9 pages
Generating AI Text To Image A Comprehensive Guide
No ratings yet
Generating AI Text To Image A Comprehensive Guide
3 pages
Text-to-Image Synthesis With Generative Models Met
No ratings yet
Text-to-Image Synthesis With Generative Models Met
16 pages
Nss 5th Sem
No ratings yet
Nss 5th Sem
18 pages
Mini Project Report
No ratings yet
Mini Project Report
31 pages
Final SRS
No ratings yet
Final SRS
10 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
F C C: E AI A U: ROM Reation To Urriculum Xamining The Role of Generative IN RTS Niversities
No ratings yet
F C C: E AI A U: ROM Reation To Urriculum Xamining The Role of Generative IN RTS Niversities
17 pages
Generative AI Questions
No ratings yet
Generative AI Questions
4 pages
Exam Fee ( (01 CSE 22) (7th Sem) )
No ratings yet
Exam Fee ( (01 CSE 22) (7th Sem) )
2 pages
Numpy
No ratings yet
Numpy
2 pages
Ritik Resume
No ratings yet
Ritik Resume
1 page
Order of Preference of The Post Details: Railway Recruitment Board Ministry of Railways, Govt. of India
No ratings yet
Order of Preference of The Post Details: Railway Recruitment Board Ministry of Railways, Govt. of India
2 pages
Industrial Training Report01
No ratings yet
Industrial Training Report01
22 pages
INDIASPOSTGDS
No ratings yet
INDIASPOSTGDS
3 pages
Chatbot Project Report
No ratings yet
Chatbot Project Report
48 pages
BRAVOHistory Application List
No ratings yet
BRAVOHistory Application List
1 page
Tittle: Gesturing Gives Children New Ideas About Math
No ratings yet
Tittle: Gesturing Gives Children New Ideas About Math
3 pages
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
No ratings yet
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
51 pages
Introduction
No ratings yet
Introduction
5 pages
Invisible Knapsack Answers
No ratings yet
Invisible Knapsack Answers
5 pages
S V Gomba and 3 Others (293 of 2024) 2024 ZWHHC 283 (8 July 2024)
No ratings yet
S V Gomba and 3 Others (293 of 2024) 2024 ZWHHC 283 (8 July 2024)
16 pages
PRISTINE - Shanta Holdings LTD
No ratings yet
PRISTINE - Shanta Holdings LTD
20 pages
CHAPTER 11 Evaluation & Control: Strategic Management & Business Policy
No ratings yet
CHAPTER 11 Evaluation & Control: Strategic Management & Business Policy
46 pages
Environmental Impacts of Metallurgical Engineering
No ratings yet
Environmental Impacts of Metallurgical Engineering
38 pages
08 Naskah Publikasi
No ratings yet
08 Naskah Publikasi
10 pages
Endocrine Function of The Kidney: by Dr. Isam Eldin Mohamed Abd Alla
No ratings yet
Endocrine Function of The Kidney: by Dr. Isam Eldin Mohamed Abd Alla
20 pages
More Telugu Short Stories: Shiva'S Sword: Ã©áôþã Ž™Áóï
No ratings yet
More Telugu Short Stories: Shiva'S Sword: Ã©áôþã Ž™Áóï
1 page
Baby Corgi: Pattern #-Yarn
No ratings yet
Baby Corgi: Pattern #-Yarn
7 pages
George Bailey in The Twenty-First Century - Are We Moving To The P
No ratings yet
George Bailey in The Twenty-First Century - Are We Moving To The P
38 pages
Women Semonides Poem 7
No ratings yet
Women Semonides Poem 7
7 pages
Moisture's Impact on Air Classifiers
No ratings yet
Moisture's Impact on Air Classifiers
11 pages
03 ForSci
No ratings yet
03 ForSci
12 pages
ABN AMRO Schweiz AG Amended Complaint (10-3635)
No ratings yet
ABN AMRO Schweiz AG Amended Complaint (10-3635)
67 pages
TCS NQT
No ratings yet
TCS NQT
54 pages
Crime Punishment Islamic Law
No ratings yet
Crime Punishment Islamic Law
232 pages
Fichas Tecnicas Modern Fuji Elevator
No ratings yet
Fichas Tecnicas Modern Fuji Elevator
132 pages
Week 13 (Intro To Linear Programming)
No ratings yet
Week 13 (Intro To Linear Programming)
24 pages
English Irregular Verbs Guide
No ratings yet
English Irregular Verbs Guide
2 pages
Sept 2020 Adc Exam
No ratings yet
Sept 2020 Adc Exam
5 pages
Prevent Arguments with One Phrase
No ratings yet
Prevent Arguments with One Phrase
3 pages
Test On Unit 1, First Prep, First Term, 2025, by Dr. Mohamed Shawky Elnaggar
No ratings yet
Test On Unit 1, First Prep, First Term, 2025, by Dr. Mohamed Shawky Elnaggar
4 pages
Report On Saradha Scam
No ratings yet
Report On Saradha Scam
2 pages
3 Delivery in The Health Care To The Filipino
No ratings yet
3 Delivery in The Health Care To The Filipino
27 pages
Mama Food Home Menu
No ratings yet
Mama Food Home Menu
1 page
Group 1 03092024
No ratings yet
Group 1 03092024
30 pages
PUTHANE 8290: Polyurethane Coating Specs
No ratings yet
PUTHANE 8290: Polyurethane Coating Specs
2 pages

ImageGenerator Project Report (Tech-Titans)

Uploaded by

ImageGenerator Project Report (Tech-Titans)

Uploaded by

TEXT-TO-IMAGE-GENERATOR USING STABLE

Minor Project Report

COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF ENGINEERING & TECHNOLOGY

BABA GHULAM SHAH BADSHAH UNIVERSITY

Place: Baba Ghulam Shah Badshah Signature of the Student

Place:Baba Ghulam Shah University Rajouri

Signature of the Guide Signature of HoD

Signature of External Examiner

Chapter Number Title Page Number

2 Literature Survey 2-3

3 System Design 4-6

3.2 Google Colab Notebook For Python

5 Creation of Project 9-11

1.1.OVERVIEW OF THE PROJECT

1.2. SCOPE AND OBJECTIVE

2.2 Diffusion Models.

2.3 Stable Diffusion.

2.5 Applications and Challenges.

3.1. Natural Language Processing (NLP)

3.2. Advantages of Natural Language Processing.

3.3. Disadvantages of NLP.

•Ambiguity: Natural language can be ambiguous or vague, sometimes leading to unexpected

The architecture of the text-to-image generator using Stable

•Processor: Intel i5 or higher / AMD equivalent

3.6. Software Requirements.

•Operating System: Windows 10/11, Linux, or macOS

In this chapter ,we have discussed introduction to NLP, software requirement,and

A Python library is a collection of related modules. It contains bundles of code that

The data component in this project refers primarily to:

3.2 Google Colab Notebook For Python

5.1. FILES STRUCTURE

5.2. STEP-BY-STEP IMPLEMENTATION

Step 1: Install Required Libraries

Step 2: Import Libraries

Step 3: Authenticate with Hugging Face (if required)

Step 5: Generate an Image from a Text Prompt

# Display the image

Step 6: Save the Generated Image

"A serene mountain landscape in the style of a watercolor painting",

"A cyberpunk robot playing chess, digital art",

"A cute cat wearing a wizard hat, cartoon style"

for i, prompt in enumerate(prompts):

img = pipe(prompt, num_inference_steps=40).images[0]

6.1. SAMPLE CODES.

# Install necessary libraries

from pathlib import Path

# Sample prompts according to config

# Optionally truncate prompts to max length (in

# Create a PyTorch Dataset

def __getitem__(self, idx):

# === Generate and Save Images ===

for idx, prompt in enumerate(truncated_prompts):

print("All images generated and saved in

with open("generated_images/prompts.txt", "w") as f:

# === Load Stable Diffusion Pipeline ===

Fig 6.1.1 TextToImageGenerator.ipynb Outputs.

2.“Visual Explanation for Text-to-image Stable Diffusion” (arXiv, 2023):

3.“TEXT-IMAGE GENERATION-STABLE DIFFUSION” (IJSREM

4.“What’s in a text-to-image prompt? The potential of stable diffusion in

5.Stable Diffusion 3: Research Paper (Stability AI, 2024):

6.Official Stable Diffusion by Stability AI (GitHub):

7.Generative Models by Stability AI (including Stable Video Diffusion):

You might also like

def getitem(self, idx):