0% found this document useful (0 votes)

19 views28 pages

LLMs in Python Free Course by Inder P Singh

The document is a comprehensive guide on using Large Language Models (LLMs) in Python, covering setup, installation, and practical applications such as inference, API integration, and local deployment. It emphasizes Python's advantages in AI development, including its rich libraries and community support, while providing quizzes for knowledge assessment. Key topics include prompt engineering, fine-tuning, and best practices for efficient model usage.

Uploaded by

Bipul Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views28 pages

LLMs in Python Free Course by Inder P Singh

Uploaded by

Bipul Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to

LLMs in Python
by Inder P Singh

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
1 Introduction to LLMs in Python _____________________________________________________3
2 Setup & Installation _______________________________________________________________5
3 Basic Inference with Transformers __________________________________________________6
4 Calling OpenAI’s ChatGPT API in Python_____________________________________________9
5 Local Deployment with Hugging Face Models ______________________________________ 12
6 Prompt Engineering in Python ____________________________________________________ 14
7 Fine‑Tuning & Custom Training ___________________________________________________ 16
8 Advanced Techniques: Streaming, Batching & Callbacks ___________________________ 18
9 Efficiency & Optimization ________________________________________________________ 20
10 Integration & Deployment Workflows ____________________________________________ 22
11 Best Practices & Troubleshooting________________________________________________ 25
12 Introduction to LLMs in Python Quiz _____________________________________________ 27

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
1 Introduction to LLMs in Python
Q: What does “Introduction to LLMs in Python” mean?
A: Introduction to LLMs in Python means the foundational knowledge and practical steps
necessary to utilize Large Language Models (LLMs) within the Python ecosystem. It covers
concepts such as loading pre‑trained models, tokenization, inference, and integration with
APIs or libraries. This course equips developers and technical users with the skills to
harness LLM capabilities (like text generation, summarization, and translation) directly
through Python code.
Note: If you are new to Python, you can view an introduction to Python in the tutorial here.
The full set of Python for beginners tutorials are available here.

Q: Why is Python the lingua franca (common language) for LLM development?
A: Python has extensive machine‑learning libraries (such as Transformers, PyTorch, and
TensorFlow), a vibrant ecosystem of wrapper utilities (like Hugging Face pipelines), and
API clients (for OpenAI’s GPT services). Its readable syntax and broad community support
enable fast experimentation and production deployment, making it the popular choice for
AI practitioners.

Q: What are typical use cases for LLMs in Python?

A: Developers and technical users can use LLMs in Python for tasks including automated
documentation generation, code completion, chatbots, data extraction from
unstructured text, and sentiment analysis. Python’s data‑processing capabilities allow
these models to integrate with web frameworks, data pipelines, and DevOps tools.
Example: A Python script using an LLM can parse customer reviews, summarize sentiment
trends, and output a JSON report for BI dashboards.

If you need video tutorials, please check out the Software and Testing Training playlists
here.

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
Quiz:

1. Which language is most used for LLM integration due to its rich AI libraries?
A. Java
B. Python (Correct)
C. C++
D. Ruby
2. Loading a pre‑trained model and performing tokenization in Python typically involves
which library?
A. NumPy
B. Transformers (Correct)
C. Requests
D. Matplotlib
3. An LLM use case that transforms raw text into structured key‑value pairs exemplifies:
A. Code completion
B. Data extraction (Correct)
C. Model quantization
D. Image segmentation
4. The phrase “lingua franca” in this context refers to Python’s role as:
A. A spoken language for data scientists
B. A common programming language for LLM tasks (Correct)
C. A legacy scripting language
D. A proprietary AI framework
5. In a chatbot application, Python’s role is to:
A. Train the LLM from scratch
B. Serve as the interface for prompt sending and response handling (Correct)

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
2 Setup & Installation
Q: How can you confirm that you have the correct Python 3.8+ environment for LLM
development?
A: You can verify your interpreter with the command python --version and, if
necessary, install a compatible distribution (such as Anaconda or the official Python
installer). To isolate dependencies, create a virtual environment using python -
m venv venv , then activate it (source venv/bin/activate on Linux/macOS or
venv\Scripts\activate on Windows). This makes sure that the package versions for
LLM libraries remain consistent across projects.

Q: How is the Transformers library installed and initialized in Python?

A: Within the activated environment, run pip install transformers. After installation,
you can load a model and tokenizer with code such as:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = AutoModelForCausalLM.from_pretrained("gpt2")

This setup provides the core classes needed for tokenization and inference.

Q: What are the steps to install and configure PyTorch for GPU acceleration?
A: First, determine your CUDA version (e.g., nvidia-smi). Then install PyTorch with the
matching CUDA toolkit using the command from the official site, for example pip
install torch torchvision torchaudio --extra-index-url
https://download.pytorch.org/whl/cu117. Verify GPU availability in Python with:

import torch

torch.cuda.is_available() # Returns True if GPU is ready

Q: How can you add the OpenAI client library and authenticate API access?
A: Install via pip install openai. Set your API key as an environment variable—export
OPENAI_API_KEY="sk‑..." on Linux/macOS or set OPENAI_API_KEY="sk‑..." on
Windows, so your scripts can call:

import openai openai.api_key = os.getenv("OPENAI_API_KEY")

This avoids hard‑coding secrets, for security.

Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
Q: What techniques help manage dependencies and their versions across different
projects?
A: Use a requirements.txt file generated with pip freeze > requirements.txt to
lock exact versions. This prevents version conflicts when multiple LLM projects coexist.

Connect with Inder P Singh (6 years' experience in AI and ML) on LinkedIn. You can
message Inder if you need personalized training or want to collaborate on projects.

Quiz:

1. Which command creates a virtual environment in Python?

A. python -m venv venv (Correct)
B. pip install venv
C. conda activate venv
D. python setup.py venv
2. To install the Transformers library, you use:
A. pip install torch
B. pip install transformers (Correct)
C. pip install openai
D. pip install numpy
3. How do you verify GPU availability for PyTorch?
A. torch.has_gpu()
B. torch.cuda.is_available() (Correct)
C. nvidia.check_gpu()
D. torch.device("cpu")
4. Securely setting your OpenAI API key involves:
A. Hard‑coding it in your script
B. Storing it in config.json
C. Exporting it as an environment variable (Correct)
D. Passing it as a URL parameter
5. A requirements.txt file is generated with:
A. pip list > requirements.txt
B. pip freeze > requirements.txt (Correct)
C. pip install requirements.txt
D. pip dependency-list > requirements.txt

3 Basic Inference with Transformers

Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
Q: How can you initialize a pipeline for text generation using Hugging Face?
A: You import and call the pipeline constructor from the Transformers library,
specifying the task and model name. For example:

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

This generator object handles both tokenization and decoding internally, allowing you to
pass prompts and parameters.

Q: How does tokenization work before generating text?

A: The pipeline’s tokenizer splits input strings into discrete tokens (meaning subword
units) from the model’s vocabulary. It then converts those tokens into numeric IDs. For
instance:

tokens = generator.tokenizer("Hello, Inder!", return_tensors="pt")

Here, tokens.input_ids contains a tensor of IDs that the model consumes.

Q: How do you generate a simple completion once the pipeline is set up?
A: Call the generator with your prompt and configuration options like max_length and
num_return_sequences:

result = generator("The future of AI is", max_length=30,

num_return_sequences=1)
print(result[0]["generated_text"])

This returns the prompt plus generated continuation up to 30 tokens.

Q: What parameters help control generation behavior in the pipeline?

A: Parameters such as temperature (for randomness), top_k or top_p (for sampling
diversity), and do_sample (to enable sampling) adjust creativity and variation:

generator("Explain quantum computing:", max_length=50,

temperature=0.7, top_p=0.9, do_sample=True)

# © Inder P Singh https://www.linkedin.com/in/inderpsingh

Quiz:

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
1. Which function creates a text-generation pipeline?
A. AutoModel.from_pretrained
B. pipeline("text-generation", ...) (Correct)
C. generate_text(...)
D. TextGenerator()
2. In tokenization, what does the tokenizer output?
A. Raw strings
B. Numeric token IDs (Correct)
C. Model weights
D. Attention scores
3. The max_length parameter controls:
A. The number of pipelines created
B. The maximum tokens in the generated sequence (Correct)
C. The tokenizer vocabulary size
D. The number of threads used
4. To enable probabilistic sampling diversity, which parameter is used?
A. force_cpu
B. do_sample (Correct)
C. use_cache
D. return_tensors

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
4 Calling OpenAI’s ChatGPT API in Python
Q: How can you authenticate when using OpenAI’s ChatGPT API in Python?
A: You can install the openai package and set your API key as an environment variable—
export OPENAI_API_KEY="sk-..." on Linux/macOS or set
OPENAI_API_KEY="sk-..." on Windows. In your script, you then invoke:

import os, openai

openai.api_key = os.getenv("OPENAI_API_KEY")

This keeps your secret key out of source code and loaded at runtime.

Q: What structure does the messages parameter use in ChatGPT calls?

A: The messages argument is a list of role‑tagged dictionaries defining the conversation.
Each entry has "role" set to "system", "user", or "assistant", and a "content"
string. For example:

messages = [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Summarize the benefits of LLMs in

Python."}

Q: How can you make a synchronous ChatGPT call using the SDK?
A: Use openai.ChatCompletion.create, passing the model name and messages. For
example:

response = openai.ChatCompletion.create(

model="gpt-4",

messages=messages,

temperature=0.3,

max_tokens=150

)
Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
print(response.choices[0].message.content)

This blocks execution until the full response is received and available in
response.choices.

Q: How can you handle streaming responses to display partial results in real time?
A: Enable the stream=True flag and iterate over the response generator. Each chunk
contains delta segments that you can print as they arrive. This approach provides a more
responsive experience, rendering output token by token:

stream = openai.ChatCompletion.create(

model="gpt-4",

messages=messages,

stream=True

for chunk in stream:

print(chunk.choices[0].delta.get("content", ""), end="",

flush=True)

Quiz:

1. Where should you store your OpenAI API key for secure access?
A. In your script as a constant
B. As an environment variable (Correct)
C. In a GitHub gist
D. In plain text logs
2. The messages list item with "role": "system" is used to:
A. Define user queries
B. Set high‑level instructions for the assistant (Correct)
C. Stream responses
D. Format JSON output
3. In a synchronous call, the generated text is retrieved from:
A. response.data
B. response.choices[0].message.content (Correct)
Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
C. response.text
D. response.streaming
4. To receive partial content as it’s generated, you must set:
A. stream=True (Correct)
B. do_sample=True
C. echo=True
D. max_tokens=1

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
5 Local Deployment with Hugging Face Models
Q: How can you download model weights for local inference using Hugging Face?
A: You can call the from_pretrained method on both the model and tokenizer classes,
specifying the model identifier. For example:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = AutoModelForCausalLM.from_pretrained("gpt2")

This fetches the weights and vocabulary files into your local cache, making them available
for offline use.

Q: How do you prepare the AutoTokenizer and AutoModelForCausalLM for inference?

A: After loading, you set the model to evaluation mode and move it to the appropriate
device. For example:

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)

model.eval()

Q: What steps enable running inference on CPU versus GPU?

A: The device assignment determines where tensors reside. On GPU:

input_ids = tokenizer("Hello, world!",

return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_length=50)

If device is CPU, the same code runs on the processor, though with higher latency. Always
move both model and tensors to device for consistent execution.

Follow Inder P Singh (6 years' experience in AI and ML) on LinkedIn to get the new AI and ML
documents for FREE.

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
Quiz:

1. Which method fetches model weights and tokenizer files?

A. load_pretrained
B. from_pretrained (Correct)
C. download_model
D. init_pretrained
2. To switch the model to evaluation mode before inference, you call:
A. model.train()
B. model.eval() (Correct)
C. model.generate()
D. model.infer()
3. How do you determine whether to use GPU or CPU for inference?
A. By checking torch.device availability (Correct)
B. By reading a config file
C. By calling model.device()
D. By inspecting tokenizer attributes
4. To run inference on GPU, you must:
A. Only move the model to CUDA
B. Move both model and input tensors to CUDA (Correct)
C. Increase max_length
D. Enable do_train mode

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
6 Prompt Engineering in Python
Q: How can you build a prompt template programmatically in Python?
A: You can define a Python format string with placeholders for dynamic values. This
approach centralizes prompt structure and allows easy substitution for different inputs.
For example:
template = "Translate the following text to French:\n\n\"{text}\""

def render_prompt(text):

return template.format(text=text)

prompt = render_prompt("Good morning!")

Q: What is prompt chaining and how is it implemented in Python?

A: Prompt chaining sequences multiple calls to the API, using each response to form the
next prompt. In Python:

first = client.chat(["Summarize this article: ..."])

second = client.chat([f"Based on that summary, list three key

takeaways:\n\n{first}"])

By passing first into the second call, you create a pipeline where each step builds on
prior output.

Q: How do you compare zero‑shot and few‑shot strategies in code?

A: For zero‑shot, send only the instruction:

resp0 = client.chat([{"role":"user","content":"Explain quantum

computing in simple terms."}])

For few‑shot, include inline examples:

examples = [

{"role":"user","content":"Q: What is 2+2?\nA: 4"},

{"role":"user","content":"Q: What is 10-3?\nA: 7"}

prompt = examples + [{"role":"user","content":"Q: What is 5×3?\nA:"}]

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
resp1 = client.chat(prompt)

You can compare resp0 with resp1 to reveal improved accuracy and formatting
consistency with few‑shot.

Quiz:

1. Which Python construct allows dynamic insertion of variables into a prompt template?
A. Lambda functions
B. f‑strings or str.format (Correct)
C. List comprehensions
D. Decorators
2. Prompt chaining in Python involves:
A. Encrypting prompts before sending
B. Passing the previous API response as part of the next prompt (Correct)
C. Using only system messages
D. Parallelizing API calls
3. A zero‑shot prompt differs from a few‑shot prompt by:
A. Including multiple examples
B. Relying solely on the instruction without examples (Correct)
C. Using a higher temperature
D. Always returning JSON
4. In a few‑shot strategy, embedding two Q&A pairs before a new question helps to:
A. Decrease model temperature
B. Guide the model’s response format and improve accuracy (Correct)
C. Increase token usage efficiency
D. Force the model into eval mode

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
7 Fine‑Tuning & Custom Training
Q: How do you prepare a dataset for fine‑tuning an LLM in Python?
A: You collect and clean target‑domain text, then format it into examples. This is typically
done with keys like {"prompt": "...", "completion": "..."}. You convert this list
into a datasets.Dataset object:

from datasets import Dataset data = [{"prompt": "Hello, how are you?",
"completion": "I am fine, thank you."}]
ds = Dataset.from_list(data)
ds = ds.train_test_split(test_size=0.1)

Q: How is the Hugging Face Trainer API used to fine‑tune a model?

A: You instantiate TrainingArguments to set parameters (such as
per_device_train_batch_size, num_train_epochs, and output directory) andthen
create a Trainer with your model, tokenizer, dataset, and a data collator:

from transformers import Trainer, TrainingArguments,

DataCollatorForLanguageModeling
args = TrainingArguments(output_dir="out",
per_device_train_batch_size=4, num_train_epochs=3)
collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
trainer = Trainer(model=model, args=args, train_dataset=ds["train"],
data_collator=collator) trainer.train()

Q: What is the advantage of LoRA for custom training?

A: LoRA adds low‑rank adapter matrices to each transformer layer, training only these
small modules. This reduces GPU memory usage and training time, enabling fine‑tuning
with limited resources. Integration uses the peft library:

from peft import get_peft_model, LoraConfig

config = LoraConfig(r=8, lora_alpha=16)
peft_model = get_peft_model(model, config)
trainer.model = peft_model
trainer.train()

Q: How do you execute a Python script to run fine‑tuning from the command line?
A: You wrap your code in a script fine_tune.py and use standard Python invocation,
passing hyperparameters via flags or environment variables:
Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
python fine_tune.py --model_name gpt2 --train_file data.json --epochs
3

Quiz:

1. Which format is commonly used for fine‑tuning examples?

A. Plain text files
B. Prompt‑completion JSON objects (Correct)
C. XML documents
D. CSV without headers
2. The DataCollatorForLanguageModeling in a Trainer is responsible for:
A. Logging metrics
B. Preparing masked or causal batches (Correct)
C. Saving model checkpoints
D. Scheduling learning rate
3. The benefit of using LoRA over full fine‑tuning is:
A. Increased model size
B. Lower resource usage and faster training (Correct)
C. Eliminating need for tokenization
D. Automatic hyperparameter tuning
4. To pass hyperparameters to a Python fine‑tuning script, you typically use:
A. Hard‑coded constants
B. argparse flags or environment variables (Correct)
C. Direct modifications in library source
D. Comments in the script

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
8 Advanced Techniques: Streaming, Batching &
Callbacks
Q: How can you implement streaming generation with Hugging Face’s Transformers in
Python?
A: You call the generate method with streaming=True and iterate over tokens as they
become available.

Q: How can you batch multiple inputs to improve inference throughput?

A: You pass a list of prompts to the pipeline and specify batch_size. Transformers
groups these into a single forward pass:

prompts = ["Hello world", "Goodbye world"]

results = generator(prompts, max_length=20, batch_size=2)
for res in results:
print(res["generated_text"])

Batching reduces per‑prompt overhead.

Q: What are callback hooks? How do they help with real‑time processing?
A: Callback hooks are functions you register to execute at key moments (such as before
generation, on each new token, or after completion). In Transformers you can use
LogitsProcessor or leverage custom callback frameworks:

def on_token(token_id, score):

print(tokenizer.decode([token_id]), end="")

stream = generator.stream("Compute 2+2:", callback=on_token)

This design pattern allows custom logging, filtering, or early stopping based on token
content.

Quiz:

1. Enabling streaming in Transformers generation allows you to:

A. Pre‑load all outputs before printing
B. Receive tokens as they are generated for real‑time display (Correct)
C. Batch multiple prompts silently
D. Remove the need for tokenization
Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
2. Batching inputs increases throughput by:
A. Sending each prompt one by one
B. Grouping multiple prompts into a single model call (Correct)
C. Reducing model size
D. Eliminating GPU memory usage
3. A callback hook in token streaming can be used to:
A. Modify the model architecture
B. Execute custom code upon each generated token, such as filtering or logging
(Correct)
C. Disable sampling
D. Increase max_length automatically
4. When using batching, you should set the batch_size parameter to:
A. The number of GPUs
B. The number of prompts you wish to process simultaneously (Correct)
C. The maximum token length
D. The callback frequency

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
9 Efficiency & Optimization
Q: How do you apply quantization to an LLM in Python for reduced memory footprint?
A: Using the bitsandbytes library, you load a model in an 8‑bit precision mode:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2",
load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

This reduces weight precision from 32‑bit to 8‑bit, shrinking model size and speeding
inference.

Q: What is distillation? How can you perform distillation in Python?

A: Distillation trains a smaller “student” model to mimic a larger “teacher.” You extract
soft labels from the teacher’s outputs, then fine‑tune the student using those probabilities
as targets. With the optimum library, a high‑level API might look like:

from optimum.intel importDistillationTrainer

trainer = DistillationTrainer(teacher=teacher_model,
student=student_model, ... )
trainer.train()

This should produce a compact student model that retains most of the teacher’s
capabilities.

Q: How does parameter‑efficient tuning via LoRA work within Python?

A: The peft library implements LoRA by inserting trainable low‑rank adapters into each
transformer layer while freezing the base model:

from peft import get_peft_model, LoraConfig

config = LoraConfig(r=4, lora_alpha=16)
peft_model = get_peft_model(model, config)

You then fine‑tune peft_model normally: only the adapter parameters update, requiring
fewer resources than full‑model training.

Quiz:

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
1. Enabling load_in_8bit=True primarily achieves:
A. Increased generation diversity
B. Reduced model memory and faster inference (Correct)
C. Automatic fine‑tuning
D. Higher precision arithmetic
2. In distillation, the “student” model learns from:
A. Random noise
B. Hard labels only
C. Teacher model’s soft output probabilities (Correct)
D. Unrelated datasets
3. LoRA’s adapters are trained while the original model parameters remain:
A. Frozen (Correct)
B. Randomized
C. Quantized
D. Distilled
4. A key advantage of parameter‑efficient tuning is:
A. Full retraining of all weights
B. Minimal additional parameters and resource usage (Correct)
C. Elimination of tokenization
D. Automatic GPU selection

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
10 Integration & Deployment Workflows
Q: How can you embed an LLM in a FastAPI web service?
A: You can define an endpoint that receives a prompt, calls the model, and returns the
output. For example:

This creates a REST API where clients send JSON with a prompt field and receive generated
text.

Q: How can you build an interactive Streamlit app for LLM inference?
A: Use Streamlit’s input/output widgets to capture user text and display responses. For
example:

This renders a web interface without manual HTML.

Q: How do you create a command‑line tool for LLM calls?

A: Use argparse to parse arguments and invoke the model accordingly:

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
import argparse
from transformers import pipeline

parser = argparse.ArgumentParser()
parser.add_argument("--prompt", required=True)
args = parser.parse_args()

generator = pipeline("text-generation", model="gpt2")

print(generator(args.prompt, max_length=50)[0]["generated_text"])

Running python tool.py --prompt "Hello from Inder" prints the generated text in
the terminal.

Q: What are the steps to Dockerize an LLM microservice?

A: Write a Dockerfile specifying a base image, install dependencies, copy your
application, and expose the port:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

Quiz:

1. In FastAPI, you return model outputs from an endpoint defined with:

A. @app.get
B. @app.post (Correct)
C. @app.delete
D. @app.update
2. Streamlit’s widget to capture multi‑line text input is:
A. st.text_input
B. st.text_area (Correct)

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
C. st.button
D. st.slider
3. To parse CLI arguments in a Python tool, you typically use:
A. json module
B. argparse (Correct)
C. logging
D. threading
4. In a Dockerized LLM service, the CMD line often invokes:
A. python app.py
B. uvicorn with FastAPI application (Correct)
C. streamlit run
D. bash start.sh

Hope that you are finding this Introduction to LLMs in Python document useful! You can
message Inder P Singh (6 years' experience in AI and ML) on LinkedIn to get the new AI and
ML documents. You can follow Inder on Kaggle to get his public datasets and code.

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
11 Best Practices & Troubleshooting
Q: How should you implement logging in a Python LLM application?
A: Use the logging module to record prompts, responses, and errors with appropriate log
levels. For example:

import logging

logging.basicConfig(level=logging.INFO)

logging.info(f"Prompt: {prompt}")

response = generator(prompt)

logging.info(f"Response: {response[0]['generated_text']}")

This creates a timestamped audit trail without clutter.

Q: What strategy handles errors gracefully during inference?

A: Wrap API or model calls in try/except blocks (learn about Python try except here) and
implement retries with exponential backoff. For instance:

Q: How can you manage token‑limit issues when prompts exceed the model’s context
window?
A: Implement a sliding‑window or truncation strategy: retain the most recent tokens and
drop the earliest when the combined length exceeds max_length. For example:

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
Q: What are best practices for version control of prompts?
A: Store prompts as template files in your code repository, tag changes with commit
messages, and reference versions in your code. Use modular functions to load specific
prompt versions:

with open(f"prompts/prompt_v{version}.txt") as f:

template = f.read()

Quiz:

1. To record the prompts and responses that occurred during runtime, you should use:
A. print() statements
B. The logging module with INFO level (Correct)
C. File I/O only
D. Database inserts
2. Implementing retries with exponential backoff helps mitigate:
A. Token limit errors
B. Transient inference errors (Correct)
C. Version control conflicts
D. Docker build issues
3. A sliding‑window token strategy is used to:
A. Increase model batch size
B. Handle prompt lengths exceeding context limits (Correct)
C. Speed up tokenization
D. Compress model weights
4. Storing prompt templates in version control helps:
A. Faster inference
B. Reproducible prompt evolution and auditability (Correct)
C. Lower memory usage
D. Automatic API key rotation

Download for reference Like Share

YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
12 Introduction to LLMs in Python Quiz
1. Which command creates an isolated Python environment for LLM projects?
A. python -m venv venv (Correct) – This avoids dependencies’ conflict across
projects.
B. pip install venv
C. conda run venv
D. python setup.py venv
2. To generate text with Hugging Face’s pipeline, you must initialize it with:
A. pipeline("text-classification")
B. pipeline("text-generation", model="gpt2") (Correct) – This loads both the
model and tokenizer for generation.
C. AutoModel.from_pretrained("gpt2")
D. generate_text("gpt2")
3. The messages list passed to OpenAI’s ChatCompletion API requires roles such as:
A. "agent", "user", "system"
B. "system", "user", "assistant" (Correct) – These define instruction context, user
prompts, and model responses.
C. "prompt", "completion"
D. "input", "output"
4. When deploying a model locally with AutoModelForCausalLM, switching to GPU
involves:
A. Calling model.cuda() and moving input tensors with .to(device) (Correct)
B. Reinstalling the model with GPU flag
C. Only setting device="cuda" in the pipeline
D. Using model.enable_gpu()
5. A prompt template rendered via Python’s str.format is an example of:
A. Prompt chaining
B. Prompt templating (Correct) – It allows dynamic insertion of variables like {text}
into a fixed string structure.
C. Few‑shot prompting
D. Model quantization
6. In prompt chaining, the output of the first API call is used to:
A. Train the model
B. Serve as the next prompt’s input (Correct) – This builds complex, multi‑step
workflows.
Download for reference Like Share
YouTube Channel: Software and Testing Training (341 Tutorials, 82,000 Subscribers)
Blog: Fourth Industrial Revolution
Copyright © 2025 All Rights Reserved.
C. Increase token limits
D. Generate random seeds
7. Fine‑tuning with LoRA via the peft library trains only:
A. The full model parameters
B. Low‑rank adapter modules (Correct) – This is to reduce training cost and memory.
C. Tokenizer vocabulary
D. GPU kernels
8. Streaming generation in Transformers enables you to:
A. Precompute entire responses before viewing
B. Receive tokens one by one in real time (Correct) – Useful for responsive UIs.
C. Batch multiple prompts
D. Skip tokenization
9. Quantizing a model to 8‑bit precision achieves:
A. Higher accuracy than 32‑bit
B. Reduced memory footprint and faster inference (Correct)
C. Full fine‑tuning without GPU
D. Automatic prompt optimization
10. Embedding prompt templates and tracking versions in Git ensures:
A. Rapid inference
B. Reproducible prompt evolution and auditability (Correct)
C. Automatic error handling
D. Unlimited token limits
11. A Dockerized LLM microservice typically uses a CMD instruction to launch:
A. python main.py
B. uvicorn app:app --host 0.0.0.0 --port 80 (Correct) – This starts the FastAPI app for
inference.
C. streamlit run app.py
D. bash start.sh
12. Implementing retries with exponential backoff in inference code helps mitigate:
A. Model quantization errors
B. Transient runtime or API call failures (Correct) – It enables continuity so that
temporary issues don’t crash the application.
C. Tokenization speed issues
D. Version control conflicts

Download for reference Like Share

2024 - NN - Python Development With Large Language Models From Text To Tasks Python Programming With The Help of Large Language Models - Millie
100% (2)
2024 - NN - Python Development With Large Language Models From Text To Tasks Python Programming With The Help of Large Language Models - Millie
134 pages
ML Interview Ke Pehle Padhna Hai
No ratings yet
ML Interview Ke Pehle Padhna Hai
59 pages
Stas Bekman - Machine Learning Engineering
100% (1)
Stas Bekman - Machine Learning Engineering
217 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
PyTorch For Building Large Language Models
No ratings yet
PyTorch For Building Large Language Models
93 pages
Python AI ML LLM TrainingJun142024
No ratings yet
Python AI ML LLM TrainingJun142024
192 pages
Stas Bekman - Machine Learning Engineering
No ratings yet
Stas Bekman - Machine Learning Engineering
261 pages
Own Your AI - Tech Deck
No ratings yet
Own Your AI - Tech Deck
75 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
100% (1)
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
Breaking Into Machine Learning Engineering: A Primer On MLE Skills and Interviews For Beginners
No ratings yet
Breaking Into Machine Learning Engineering: A Primer On MLE Skills and Interviews For Beginners
39 pages
Brolly AI - Generative AI - Online Training
No ratings yet
Brolly AI - Generative AI - Online Training
13 pages
11 Python Libraries Every AI Engineer Should Know
No ratings yet
11 Python Libraries Every AI Engineer Should Know
5 pages
Machine Learning Engineering Guide
No ratings yet
Machine Learning Engineering Guide
308 pages
GenAI LLM Complete Course
No ratings yet
GenAI LLM Complete Course
70 pages
14 Key Skills To Master Large Language Models 1729745509
No ratings yet
14 Key Skills To Master Large Language Models 1729745509
17 pages
Resume Prep and Clarification
No ratings yet
Resume Prep and Clarification
10 pages
PythonAI LLMs ForSharing
100% (2)
PythonAI LLMs ForSharing
47 pages
LLMOps Toolkit - Prashant Sahu
No ratings yet
LLMOps Toolkit - Prashant Sahu
12 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
LLM Book
No ratings yet
LLM Book
275 pages
AI Tools
No ratings yet
AI Tools
19 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
代码大模型
No ratings yet
代码大模型
18 pages
ECE3025 AI With Python
No ratings yet
ECE3025 AI With Python
134 pages
Natural Language Processing With Pytorch Readthedocs Io en Latest PDF
No ratings yet
Natural Language Processing With Pytorch Readthedocs Io en Latest PDF
35 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Building A Large Language Model LLM From Scratch
No ratings yet
Building A Large Language Model LLM From Scratch
13 pages
AI With Python Summary
No ratings yet
AI With Python Summary
20 pages
ChatGPT LLM Website and AI Python Guide
No ratings yet
ChatGPT LLM Website and AI Python Guide
3 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (6)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Day 5
No ratings yet
Day 5
48 pages
21046
No ratings yet
21046
38 pages
GenAI LLM Foundations and Building Blocks
No ratings yet
GenAI LLM Foundations and Building Blocks
6 pages
Generative AI Index
No ratings yet
Generative AI Index
9 pages
How To Train Your Own LLM
No ratings yet
How To Train Your Own LLM
29 pages
Prompt Engineering NLP Master Guide
No ratings yet
Prompt Engineering NLP Master Guide
14 pages
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
100% (1)
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
21 pages
IOAI - Educational Resources
No ratings yet
IOAI - Educational Resources
1 page
Python Ai ML
No ratings yet
Python Ai ML
11 pages
Generative AI With Python - Bert Gollnick
100% (2)
Generative AI With Python - Bert Gollnick
708 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Understanding The LLM Inference Workload
No ratings yet
Understanding The LLM Inference Workload
63 pages
Roadmap To LLM
No ratings yet
Roadmap To LLM
12 pages
Make Your LLM Core-1
No ratings yet
Make Your LLM Core-1
104 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
Complete Generative AI Curriculum
No ratings yet
Complete Generative AI Curriculum
6 pages
Latest Trends Sep 2023
No ratings yet
Latest Trends Sep 2023
6 pages
AI and Prompt
No ratings yet
AI and Prompt
18 pages
AI ML Engineer Roadmap Formatted
No ratings yet
AI ML Engineer Roadmap Formatted
6 pages
Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs
No ratings yet
Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs
41 pages
Ways To Use LLM in Finance Organisation
No ratings yet
Ways To Use LLM in Finance Organisation
5 pages
Et Tu Code - Demystifying LLM, AI Mathematics, and Hardware Infra (2024)
100% (1)
Et Tu Code - Demystifying LLM, AI Mathematics, and Hardware Infra (2024)
541 pages
AI Insights and Innovations
No ratings yet
AI Insights and Innovations
1 page
LLM Guide for Interns
No ratings yet
LLM Guide for Interns
4 pages
Pytorch Cheat Sheet For Beginners and Udacity Deep Learning Nanodegree
No ratings yet
Pytorch Cheat Sheet For Beginners and Udacity Deep Learning Nanodegree
23 pages
Generativeai Masters Content
No ratings yet
Generativeai Masters Content
21 pages
Using LLMs For Smart Contract Programming
No ratings yet
Using LLMs For Smart Contract Programming
16 pages
Week 3 Day 1
No ratings yet
Week 3 Day 1
8 pages
Machine Learning and Generative AI
No ratings yet
Machine Learning and Generative AI
5 pages
Automation Platform Decision Matrix Workato Vs SnapLogic Vs Zapier
No ratings yet
Automation Platform Decision Matrix Workato Vs SnapLogic Vs Zapier
10 pages
GPT4架构揭秘
No ratings yet
GPT4架构揭秘
12 pages
Code Pal Result
No ratings yet
Code Pal Result
1 page
Your Kids Are Big Tech's Guinea Pigs
No ratings yet
Your Kids Are Big Tech's Guinea Pigs
4 pages
ChatGPT History
100% (1)
ChatGPT History
58 pages
Artificial Intelligence Sizing and Seizing The Investment Opportunity
No ratings yet
Artificial Intelligence Sizing and Seizing The Investment Opportunity
104 pages
EY Report On Impact of AI On India-Feb 2025
No ratings yet
EY Report On Impact of AI On India-Feb 2025
132 pages
What Is Generative Ai v7
No ratings yet
What Is Generative Ai v7
5 pages
Can Generative AI Improve Social Science
No ratings yet
Can Generative AI Improve Social Science
23 pages
Build Custom AI Voice Server
No ratings yet
Build Custom AI Voice Server
13 pages
Emotional Attachment To Ai
No ratings yet
Emotional Attachment To Ai
2 pages
Developers' Guide to OpenAI Codex
No ratings yet
Developers' Guide to OpenAI Codex
2 pages
All Ai Info
No ratings yet
All Ai Info
19 pages
Emerging Technology
No ratings yet
Emerging Technology
32 pages
04 Generate Code With Azure OpenAI Service
No ratings yet
04 Generate Code With Azure OpenAI Service
28 pages
GPT-4 Features and Applications
100% (1)
GPT-4 Features and Applications
4 pages
Few Shot Prompt Static Analysis
No ratings yet
Few Shot Prompt Static Analysis
12 pages
ChatGPT: Features, Uses, and Limits
No ratings yet
ChatGPT: Features, Uses, and Limits
2 pages
The Ultimate ChatGPT Guide For Beginners - V1
100% (1)
The Ultimate ChatGPT Guide For Beginners - V1
138 pages
MTK101 - Individual Assignment
No ratings yet
MTK101 - Individual Assignment
4 pages
2023 Alan D Thompson The Sky Is Comforting Rev 0b
No ratings yet
2023 Alan D Thompson The Sky Is Comforting Rev 0b
17 pages
How To Use Function Calling With OpenAI Realtime API - by Pragnakalp Techlabs - Nov, 2024 - Generative AI
No ratings yet
How To Use Function Calling With OpenAI Realtime API - by Pragnakalp Techlabs - Nov, 2024 - Generative AI
13 pages
CHAPTER 1 Group 5 Mga Pabigat 6
No ratings yet
CHAPTER 1 Group 5 Mga Pabigat 6
19 pages
2025 05 09 AI Updates
100% (1)
2025 05 09 AI Updates
22 pages
The State of AI Talent 2025
No ratings yet
The State of AI Talent 2025
49 pages
How Do We Know How Smart AI Systems Are - Science
No ratings yet
How Do We Know How Smart AI Systems Are - Science
6 pages
Список 3800+ нейросетей
No ratings yet
Список 3800+ нейросетей
54 pages
Secure Java Full Stack Application Development Leveraging AI-Based Threat Detection Models
No ratings yet
Secure Java Full Stack Application Development Leveraging AI-Based Threat Detection Models
9 pages
Microsoft Partner Skilling Calendar - Mar 24 2025
No ratings yet
Microsoft Partner Skilling Calendar - Mar 24 2025
5 pages