KEMBAR78
Intro To AI - Course Notes | PDF | Artificial Intelligence | Intelligence (AI) & Semantics
0% found this document useful (0 votes)
99 views26 pages

Intro To AI - Course Notes

Uploaded by

sakariyads1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views26 pages

Intro To AI - Course Notes

Uploaded by

sakariyads1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1

Ned Krastev

Intro to AI
2

Table of Contents

................................................................................................................................................................ 1
1. Getting started.................................................................................................................................... 3
2. Data is essential for building AI....................................................................................................... 7
3. Key AI techniques .......................................................................................................................... 10
4. Important AI branches .................................................................................................................. 14
5. Understanding Generative AI ....................................................................................................... 17

Abstract

1. This course explores fundamental and advanced topics in artificial

intelligence (AI), starting with comparisons between natural and artificial

intelligence and a historical overview of AI development. It distinguishes

between AI, data science, and machine learning, and discusses the

concepts of weak versus strong AI. The data section covers structured

versus unstructured data, data collection methods, labeling, and the

increasing volume and quality of data. Key AI techniques introduced

include various forms of machine learning—supervised, unsupervised, and

reinforcement—and deep learning via neural networks. Special focus is

given to AI branches like robotics, computer vision, predictive analytics, and

generative AI, including a detailed examination of ChatGPT and other

language models. The course also addresses practical tools and

technologies in AI development such as Python, APIs, and open-source

models, and concludes with discussions on AI job roles, ethics in AI, and the

future of AI.
3

1. Getting started

1.1 Natural vs. Artificial Intelligence

- Demonstrating natural intelligence


Natural intelligence is demonstrated through diverse activities like driving on
highways, solving complex math, crafting poetry, and mastering music. These
skills, which are learned rather than innate, showcase our brain's remarkable
ability to process vast amounts of information.
- The sophistication of Human Intelligence
The human brain is incredibly sophisticated, with its capacity to acquire and apply
knowledge leading to:
• Technological innovations that far surpass the tools of past centuries.
• Tool creation, highlighting our unique ability as humans to enhance our
productivity through innovation.
- Gutenberg’s printing press

Gutenberg’s Printing Press, invented in 1440, revolutionized the way knowledge is


disseminated, making it one of the most significant machines in history. Despite its
transformative impact, it operated under fixed parameters and did not possess the
capability to learn or adapt.

1.2 Brief history of AI

The evolution of Artificial Intelligence: Key milestones


Early Beginnings
• 1950 - Alan Turing's Seminal Question: Alan Turing publishes a paper
asking, "Can machines think?" and introduces the Turing Test, setting a
practical criterion for evaluating machine intelligence. If an interrogator
can't distinguish between responses from a machine and a human, the
machine is deemed to exhibit human-like intelligence.
Formal Recognition
• 1956 - Dartmouth Conference: The term "artificial intelligence" is coined,
marking the formal start of AI as a field of study. This conference brings
4

together experts from various disciplines to discuss the potential for


machines to simulate human intelligence.

Period of Stagnation
• 1960s and 70s - AI Winter: Challenges due to limited technology and data
availability lead to reduced funding and interest, slowing AI progress.

Technological Resurgence
• 1997 - IBM’s Deep Blue: Deep Blue defeats world chess champion Garry
Kasparov, reigniting interest in AI.
• Late 1990s and Early 2000s: A surge in computer power and the rapid
expansion of the Internet provide the necessary resources for advanced AI
research.

Advancements in Neural Networks


• 2006 - Geoffrey Hinton’s Deep Learning Paper: Revives interest in neural
networks by introducing deep learning techniques that mimic the human
brain's functions, requiring substantial data and computational power.
• 2011 - IBM’s Watson on Jeopardy!: Demonstrates significant advances in
natural language processing, as Watson competes and wins in the quiz
show "Jeopardy!".
• 2012 - Building High-Level Features Paper: Researchers from Stanford and
Google publish a significant paper on using unsupervised learning to train
deep neural networks, notably improving image recognition.

Breakthroughs in Language Processing


• 2017 - Introduction of Transformers by Google Brain: These models
transform natural language processing by efficiently handling data
sequences, such as text, through self-attention mechanisms.
• 2018 - OpenAI’s GPT: Launches the generative AI technology that uses
transformers to create large language models (LLMs), leading to the
development of ChatGPT in 2022.

1.3 Demystifying AI, Data science, Machine learning, and Deep learning
5

Artificial Intelligence (AI)


• Objective: AI aims to create machines that can mimic human intelligence by
learning and acquiring new skills.
• Scope: AI is a broad field that encompasses various subfields, including
machine learning, making it the overarching discipline focused on
intelligent behavior in machines.

Machine Learning (ML)


• Definition: Machine learning is a key subfield of AI that uses statistical
methods to enable machines to improve at tasks with experience.
• Functionality: Essentially, ML involves feeding input data into a model that
processes it and produces an output. For example:
o An algorithm might analyze a user's movie-watching history to
predict which movies they will likely enjoy next.
o Financial transaction data could be used to generate a credit score
predicting a customer's loan repayment likelihood.
• Importance: ML is significant within AI because it provides the methods and
technologies that allow computers to learn from and make predictions
based on data.

Data Science
• Relationship with AI and ML: While data science includes AI and machine
learning, it also encompasses a broader set of statistical methods.
• Tools and Methods: Beyond machine learning, data scientists employ
traditional statistical methods like data visualization and statistical inference
to extract insights from data.
• Applications:
o A data scientist might use ML algorithms to predict future client
orders based on historical data.
o Alternatively, they could perform an analysis correlating client orders
with store visits to derive actionable business insights.
• Scope: Data science is not only about creating predictive models but also
about understanding and visualizing data patterns to support decision-
making.
6

1.4 Weak vs Strong AI

Narrow AI
• Definition: Narrow AI refers to artificial intelligence systems that are
designed to handle specific tasks.
• Examples and Applications: An example we discussed is a machine-
learning algorithm that predicts movie recommendations based on a user's
viewing history. This type of AI is pervasive in our daily lives and beneficial
for businesses, handling defined and narrow tasks efficiently.

Semi-Strong AI
• Introduction of ChatGPT and GPT 3.5: OpenAI's release of ChatGPT and the
GPT 3.5 models in 2022 marked a significant advancement towards semi-
strong AI.
• Capabilities: Unlike narrow AI, ChatGPT can perform a broad range of tasks:
o Writing jokes
o Proofreading texts
o Recommending actions
o Creating visuals and formulas
o Solving mathematical problems
• Relation to Turing Test: ChatGPT's ability to generate human-like responses
aligns with Alan Turing's imitation game concept, suggesting it can pass the
Turing Test, thus classifying it as semi-strong AI.

Artificial General Intelligence (AGI)


• Definition and Goal: AGI, also known as strong AI, aims to create machines
that are generally more capable than humans across a variety of tasks.
• Current Research: While significant strides have been made, leading
institutions like OpenAI and Google have not yet achieved AGI.
• Future Prospects and Ethical Considerations: Sam Altman of OpenAI
suggests AGI could be developed in the near future. As we approach this
possibility, it's crucial to consider the potential and ethical implications of
7

machines that can surpass human intelligence and independently create


new scientific knowledge.

2. Data is essential for building AI

2.1 Structured vs unstructured data

Types of Data
• Structured Data:
o Definition: Organized into rows and columns, making it easy to
analyze.
o Example: A sales transactions spreadsheet with predefined fields in
Excel.
• Unstructured Data:
o Definition: Lacks a defined structure and cannot be organized into
rows and columns. Includes formats like text files, images, videos,
and audio.
o Prevalence: Represents 80-90% of the world’s data, making it the
dominant form of data.

Evolution of Data Value


• Past Perception: Structured data was historically valued more due to its
ease of analysis.
• Current Advancements: AI technologies have advanced to the point where
unstructured data can now be transformed into valuable insights.
Companies like Meta and Google are leading in this area, unlocking the
potential of unstructured data.
Opportunities from Unstructured Data
• Business Applications: Analyzing unstructured data (e.g., photographs,
videos, text messages, emails) is creating enormous opportunities for
businesses to gain insights that were previously inaccessible.

2.2 How we collect data


8

The MNIST Database


• Overview: Often referred to as the “Hello World” of machine learning, the
MNIST database consists of 70,000 images of handwritten digits, each
28x28 pixels in grayscale.
• Purpose: The goal is to train a machine learning algorithm to recognize
handwritten digits, despite variations in individual handwriting.

Understanding Digital Representation


• Pixel Values: Each pixel in the image has a value between 0 (white) and 255
(black), representing different shades of gray.
• Binary Encoding: All digital information, including pixel values, is stored in
binary form as combinations of 0s and 1s.

Machine Learning Process


• Training: By examining these binary sequences, the computer learns to
distinguish between different digits like 3, 6, or 9 based on their unique
numerical patterns.
• Application: This training enables computers to recognize and differentiate
digits from 0 to 9, showcasing a fundamental aspect of machine learning.

Broader Implications for AI


• Information Conversion: Similar to how images are converted into binary
sequences, videos, sounds, and written text are also transformed into data
that computers can process.
• Pattern Recognition: Through machine learning, computers learn to identify
patterns and similarities in this data, which helps them perform various
tasks.

Human Perception vs. AI


• Human Brain Capabilities: Humans can process vast amounts of information
simultaneously through sensory perception.
• AI Aspiration: AI researchers aim to emulate this capability by developing
systems that can interpret complex data from multiple sources like sensors,
social media, and satellite images.
9

Importance of Data Quality


• Data Collection: Data is collected through various means such as web
scraping, APIs, and big data analytics.
• Model Accuracy: The quality of data directly impacts the effectiveness of AI
models. The adage “Garbage in, garbage out” highlights the necessity of
high-quality data for producing reliable AI outputs.

2.3 Labelled and unlabelled data

Labeled Data
• Definition: Labeled data involves tagging each item in a dataset with
specific labels that the AI model will learn to recognize and predict. For
example, photos can be classified as 'dog' or 'not a dog', and comments
can be labeled as positive, negative, or neutral.
• Process: This method requires a meticulous review and classification of
each data item, which can be time-consuming and costly.
• Benefits: Labeled data significantly enhances the accuracy and reliability of
AI models, making them more effective in real-world applications.

Unlabeled Data
• Definition: Unlabeled data does not come pre-tagged with labels. The AI
model is tasked with analyzing the data and identifying patterns or
classifications on its own.
• Application: This approach is often applied to large datasets where manual
labeling is impractical due to resource constraints.
• Trade-offs: While less resource-intensive upfront, models trained on
unlabeled data might not achieve the same level of accuracy as those
trained on well-labeled datasets.
Practical Implications
• Choice of Method: The decision between using labeled or unlabeled data
often depends on the specific requirements of the project, available
resources, and desired model performance.
10

• Future Discussions: Subsequent lessons will delve deeper into how AI


models learn from these types of data and the techniques used to enhance
their learning process.

2.4 Metadata: Data that describes data

Impact of Digitalization on AI
• Data Growth: The rapid expansion of online platforms, mobile technology,
cameras, social media, sensors, and Internet of Things devices has resulted
in a massive increase in data generation.
• Quality and Quantity: Not only has the volume of data grown, but the
quality has also improved significantly, as evidenced by the comparison of
old mobile phone photos to modern smartphone images.
Challenges of Managing Data
• Unstructured Data: A large portion of this newly generated data is
unstructured and too vast to manually label or organize effectively.
• Metadata as a Solution: To manage this overwhelming amount of data,
metadata becomes essential. Metadata is data about data, providing
summaries of key details such as asset type, author, creation date, usage,
file size, and more.

3. Key AI techniques

3.1 Machine learning

Recap and Context


• AI Basics and History: Previously, we explored foundational concepts, the
history of AI, and distinguished between weak and strong AI.
• Importance of Data: We emphasized that high-quality, abundant data is
crucial for building effective AI systems.

Introduction to Machine Learning (ML)


11

• Analogy: Machine learning is likened to a student learning from a teacher,


where the ML model is the student and the data scientist acts as the
teacher. The teacher provides extensive, quality data (learning materials) to
train the student.
• Training Process: Like preparing for an exam, the ML model is trained to
recognize patterns and solve problems using unseen data. The quality and
relevance of training data significantly influence the model's performance.

Practical Application in Business


• Real Estate Example:
o Scenario: A real estate agent wants to develop a mobile app to
estimate home selling prices based on user inputs.
o Data Scientist's Role: The data scientist assesses the feasibility of the
project, emphasizing the need for a comprehensive database of past
transactions.
o Model Development: The model predicts house prices by analyzing
past transactions (input x) to estimate selling prices (output y).
o Outcome: The successful app significantly boosts the real estate
agent's business by predicting prices and collecting potential seller
contacts.

Educational Takeaway
• Functionality of ML: This example illustrates how ML uses historical data to
learn patterns and make predictions about new, unseen situations.
• Future Lessons: The next lesson will cover different types of machine
learning models, further expanding on how these technologies are applied.

3.2 Supervised, Unsupervised, and Reinforcement learning

Supervised Learning
• Definition: Supervised learning uses labeled data to teach models how to
predict outputs based on input data.
• Classification: An example is identifying whether an image contains a dog
or not, using a dataset where each image is labeled as ‘dog’ or ‘not dog.’
12

• Regression: Another use is in prediction, such as estimating house prices


based on a dataset with known home features and prices.
• Key Point: The model is explicitly trained with known outputs, guiding its
learning process.

Unsupervised Learning
• Definition: Unsupervised learning involves analyzing data without pre-
labeled responses.
• Clustering: The model scans data to identify inherent patterns and group
similar items, such as differentiating between images of dogs and cats
without prior labels.
• Applications: Useful when labeling data is impractical or too costly, or when
the relationships within data are unknown. Examples include identifying
customer segments in a supermarket or determining popular property
types in real estate.
• Key Point: The algorithm autonomously discovers relationships and
patterns without direct input on the desired output.

Reinforcement Learning
• Definition: Reinforcement learning teaches models to make decisions by
rewarding desired behaviors and penalizing undesired ones, optimizing for
a specific goal without labeled data.
• Application and Dynamics: Commonly used in robotics and
recommendation systems like Netflix’s. The model learns from direct
interaction with the environment, improving its recommendations based on
user feedback, such as views, skips, and ratings.
• Key Point: Operates on a trial-and-error basis within defined rules,
constantly adjusting actions based on feedback to achieve the best
outcomes.

3.3 Deep learning

Deep learning, a sophisticated subset of machine learning, draws inspiration from


the human brain's structure and function. This advanced AI methodology allows
machines to process information in stages, mirroring how our brains interpret
complex stimuli.
13

Human Brain vs. Artificial Neural Networks (ANN)


• Initial Perception: Just as our brain gets a general impression from a first
glance (like recognizing a sunny beach day), the input layer of an ANN
receives raw data and begins the processing journey.
• Deeper Analysis: Upon closer inspection, our brain identifies more details
(like children around a sandcastle or odd facial features), similar to how
subsequent layers in an ANN detect and interpret more complex data
features.
• Complex Understanding: In both human perception and deep learning,
deeper layers synthesize basic insights into higher-level concepts, enabling
nuanced understanding and recognition of intricate patterns.

Practical Example of Deep Learning


• MNIST Dataset: Utilizing a dataset of handwritten digits, an ANN with
multiple layers learns to recognize numbers by identifying and combining
features such as edges and curves through its layers, ultimately determining
the digit represented in an image.
• Layer Functions:
o Input Layer: Receives raw pixel data, with each pixel's brightness
represented as an activation value.
o Hidden Layers: Sequentially refine and transform data, focusing on
increasingly specific attributes.
o Output Layer: Produces the final decision, identifying the specific
digit.

Significance of Deep Learning


• Pattern Recognition: By emulating aspects of human cognitive processes,
ANNs excel in recognizing patterns and interpreting data from large, high-
dimensional datasets.
• AI Advancements: Deep learning's ability to analyze complex patterns with
high accuracy has driven significant advancements in AI, transforming many
technological and scientific fields.
14

4. Important AI branches

4.1 Robotics

Historical Context
• Ancient Origins: Tales like the myth of Talos and the mechanical inventions
of Al-Jazari show early human fascination with automata and mechanical
beings.
• Renaissance Innovations: Leonardo Da Vinci's designs, such as the
mechanical knight and lion, prefigured modern robotic concepts,
illustrating a longstanding interest in replicating human and animal actions
through machines.

Modern Robotics
• Definition: Robotics involves designing, constructing, and operating
robots—machines capable of performing tasks either autonomously or with
human-like capabilities.
• Interdisciplinary Field: The creation of robots requires a collaborative effort
among mechanical engineers (for physical structure and mobility),
electronics and electrical engineers (for operational control), and AI
specialists (for decision-making and behavioral intelligence).

AI Integration in Robotics
• Role of AI: Advanced AI technologies drive the decision-making and
perception capabilities of robots, equipping them with sensors and
cameras to interact intelligently with their environment.
• Multi-Model Systems: Effective robots often integrate multiple AI models,
including:
o Computer Vision: For object detection and environmental
understanding.
o Simultaneous Localization and Mapping (SLAM): For navigation and
mapping.
o Reinforcement Learning: For adaptive decision-making.
o Natural Language Processing (NLP): For understanding and
generating human language.
15

Practical Applications and Future Potential


• Industrial Automation: Robots like the Tesla Bot are being developed to
handle repetitive and hazardous tasks in industrial settings, enhancing
efficiency and safety.
• Medical Robotics: Robots in healthcare are already performing precise
interventions and complex surgeries, significantly impacting patient care.
• Broader Applications: The use of robots extends to various sectors
including agriculture (harvesting robots), domestic tasks (cleaning robots),
exploration (space robots), emergency response (search and rescue
robots), and security (surveillance robots).

4.2 Computer vision

How Computer Vision Works


• Processing Images and Videos: Computers analyze still images and video
frames, understanding nuances such as movement, shape changes, and
color differences.
• Complexity of Videos: Unlike static images, videos consist of continuous
image sequences (e.g., 30 frames per second), requiring more complex
processing to maintain context and continuity.

Main Families of Computer Vision Models


• Convolutional Neural Networks (CNNs):
o Foundation: Essential for handling high-dimensional image data.
o Functionality: CNNs excel at recognizing spatial hierarchies in
images, organizing elements based on depth and importance, and
gradually learning from basic to complex features across layers.
• Transformers:
o Application: Increasingly used in computer vision, particularly in
generative AI contexts.
• Generative Adversarial Networks (GANs):
o Purpose: Primarily used for creating realistic images.
16

• Specialized Networks:
o Examples: U-net for medical image segmentation and EfficientNet for
optimizing neural network performance and resource use.

Applications of Computer Vision


• Broad Impact: From self-driving cars and medical imaging to security and
surveillance.
• Non-Robotic Uses: Face recognition software exemplifies computer vision
applied in non-robotic contexts.
• Virtual Reality: Advances in VR are revolutionizing education, entertainment,
and communication by enhancing immersive experiences.

4.3 Traditional ML

While AI innovations like ChatGPT and autonomous vehicles often capture public
imagination, a substantial portion of AI's value lies in its application within
traditional business operations. These applications might not make headlines as
frequently, but they are fundamental in transforming various industries by
enhancing efficiency and accuracy.

4.4 Generative AI

Generative AI refers to the branch of artificial intelligence capable of generating


new data or content. It stands out because it creates novel outputs, rather than just
processing existing data.
• Examples: ChatGPT and DALL-E are prominent examples, where ChatGPT
generates textual content and DALL-E produces images based on
descriptions.

Techniques in Generative AI
Large Language Models (LLMs): These are neural networks trained on vast
amounts of text data, predicting word relationships and subsequent words in
sentences. LLMs are foundational for text-based applications like ChatGPT.
17

• Diffusion Models: Used primarily for image and video generation, these
models start with a noise pattern and refine it into a detailed image,
applying learned patterns to enhance realism.
• Generative Adversarial Networks (GANs): Introduced in 2014, GANs use
two algorithms in tandem—one to generate content and the other to judge
its realism, improving both through iterative enhancement.
• Neural Radiance Fields: Specialized for 3D modeling, these are used to
create highly realistic three-dimensional environments.
• Hybrid Models: Combining techniques like LLMs and GANs, hybrid models
leverage the strengths of multiple approaches to enhance content
generation.

Impact and Applications


Industry Revolution: Generative AI is pivotal in industries ranging from
entertainment and media to architecture and healthcare, where it enables the
creation of complex, realistic models and simulations.
Corporate Influence: Big Tech firms are heavily investing in Generative AI, driving
forward innovations that can generate content across text, images, videos, audio,
and more.
Future Potential: As technology evolves, Generative AI is set to profoundly impact
how businesses operate, offering new ways to create and manipulate digital
content.

5. Understanding Generative AI

5.1 Early approaches to Natural Language Processing (NLP)

Early Beginnings
Natural Language Processing, or NLP, is a field within computer science that
focuses on enabling computers to understand, interpret, and generate human
language. Originating in the 1950s, NLP initially relied on rule-based systems.
These systems used explicit language grammar rules to process text. For example,
a rule might dictate that sentences beginning with "Can you," "Will you," or "Is it"
should be treated as questions, helping the system recognize "Can you help me?"
as a question.
18

Shift to Statistical NLP


By the late 1980s and into the early 1990s, the field began to shift towards
Statistical NLP. This new approach moved away from rigid rule-based systems to
probabilistic methods that analyze extensive data to understand language. This
transition marked a significant development, as it utilized statistics to interpret
language usage more dynamically.

Practical Example
In the 90s, statisticians would analyze sentences containing the word "can" to
determine its use as a noun or a verb—important for understanding its meaning in
context. For instance, "can" as a verb might indicate ability, while as a noun, it
could refer to a container. Analyzing how "can" was used with surrounding words
like "you" or "soda" helped predict its grammatical role in sentences.

Towards Modern NLP


This approach of analyzing word usage and context paved the way for the
development of models that predict word meanings based on calculated
probabilities. Such methods resemble early forms of machine learning,
foreshadowing the sophisticated techniques like vector embeddings and deep
learning that dramatically enhance NLP today.

5.2 Recent NLP advancements

Integration of Machine Learning


• The 2000s marked a significant shift in NLP with the integration of machine
learning techniques, greatly enhancing the capability to analyze and
interpret large volumes of text data.

Role of Vector Embeddings


• Complex Representation: Vector embeddings transformed how textual
information is processed by representing words and sentences as
numerical arrays in high-dimensional spaces.
• Semantic Similarity: These embeddings capture complex linguistic
relationships and meanings, enabling models to operate in spaces that are
often several hundred to thousands of dimensions deep.
19

Emergence of Advanced ML Models


• Linguistic Nuances: New machine learning models began to recognize
subtle linguistic elements like sarcasm and irony, which were challenging
for earlier statistical methods.
• Use of Neural Networks: By the 2010s, neural networks with their deep,
multi-layered structures became crucial for advancing NLP tasks such as
translation, speech recognition, and text generation.

Transformative Impact of Transformer Architecture


• Introduced in 2018, the transformer architecture revolutionized NLP by
facilitating the development of Large Language Models (LLMs) such as GPT
and Gemini.
• Capabilities: Transformers enable better handling of sequential data and
improve the learning of dependencies in text, significantly impacting the
field.

5.3 From Language Models to Large Language Models (LLMs)

Game-Based Learning Analogy


In a team-based word association game, players use strategic word choices to
help teammates guess a secret word. This mirrors how language models operate,
using probabilistic predictions to determine the most likely next word in a
sequence.

Language Models Explained


Language models predict words based on the context provided by surrounding
text. They are inherently probabilistic, designed to fill in blanks in sentences or
continue a sequence of text.

Types of Language Models


Masked Language Models: These models can predict a missing word anywhere in
a sentence by considering both the preceding and following context.
20

Autoregressive Language Models: These predict the next word in a sequence


using only the preceding words as context. Models like OpenAI’s GPT are
autoregressive, building each prediction based on previously generated words.

Statistical Learning and Model Training


Language models are trained on vast datasets, learning from diverse linguistic
patterns. This training allows them to develop a probabilistic understanding of
word associations, enabling them to generate coherent and contextually
appropriate text.

Generative AI
The term "generative" highlights these models' ability to produce new content,
making them a subset of Generative AI. They can generate a wide range of
outputs based on the training data they have processed.

Expansion and Scalability


Initially focused on single languages, modern models are increasingly multilingual
and trained on expanding datasets. The size of these models, often referred to as
Large Language Models (LLMs), is growing, with models like GPT-3 and GPT-4
using hundreds of billions to over a trillion parameters.

5.4 The efficiency of LLM training. Supervised vs Semi-supervised learning

Challenges of Supervised Learning


• High Costs: Supervised learning requires labeled data. For example,
labeling 100,000 customer reviews at 30 cents each amounts to $30,000.
For more complex data like medical records, the cost can soar to millions.
• Scalability Issues: The expense and effort to label data increase with the
complexity, making supervised learning less feasible for large datasets such
as the entire internet content, which would be prohibitively expensive and
biased if labeled by few individuals.

Limitations of Unsupervised Learning


21

• Lack of Direction: While unsupervised learning does not require labeled


data, it lacks specific objectives, making it difficult for models to learn
structures that are meaningful for understanding or generating human
language.
• Contextual Weaknesses: This approach struggles with language nuances
since it does not prioritize context prediction from previous texts, which is
crucial in language processing.

Introduction to Self-Supervised Learning


• Balanced Approach: Self-supervised learning offers a solution by enabling
models to generate their own labels from unlabeled data. This method
leverages the inherent structure in data to predict context and learn
effectively.
• Advancements with LLMs: Self-supervised learning has been instrumental in
developing Large Language Models (LLMs) like ChatGPT. These models
autonomously analyze text, generate labels, and use the context from
surrounding words to predict subsequent content, thus enhancing their
understanding and generation of natural language.

5.5 From N-Grams to RNNs to Transformers: The Evolution of NLP

N-grams
• Basics: N-grams predict the probability of a word based on the preceding
n−1n-1n−1 words. Unigrams (n=1) predict words without any contextual
basis, often leading to nonsensical choices, while bigrams (n=2) and
trigrams (n=3) incorporate one or two preceding words respectively.
• Limitations: Although n-grams consider immediate predecessors, they lack
an understanding of broader sentence context and fail to capture deeper
semantic relationships.

Recurrent Neural Networks (RNNs)


• Advancements: RNNs marked a significant improvement by processing
sequences of text and retaining information from previous inputs, which
allows for context-aware predictions.
• Challenges: RNNs struggle with long text inputs due to the vanishing
gradient problem, where the influence of earlier text diminishes in longer
sequences.
22

Long Short-Term Memory Networks (LSTMs)


• Solution: LSTMs address RNN limitations with a gate architecture that helps
retain or discard information selectively, improving the model's ability to
manage long-term dependencies.
• Drawbacks: Despite their effectiveness, LSTMs are computationally
expensive and slow to train, particularly on larger datasets, making them
less scalable.

Transformers
• Innovation: Introduced in the seminal paper "Attention Is All You Need" in
2017, transformers revolutionize language modeling with an attention
mechanism that assesses the relevance of different parts of the input data,
allowing the model to focus on the most important segments.
• Efficiency and Scalability: By calculating attention scores and prioritizing
certain words over others, transformers efficiently handle sequences
without the computational overhead of LSTMs, enhancing scalability and
performance in processing large volumes of text.

Implications for Large Language Models


• The development of transformers has enabled the creation of powerful
LLMs like ChatGPT by improving how machines understand and generate
human-like text. This technology underpins the sophisticated capabilities of
current AI models, allowing them to generate coherent and contextually
aware language outputs on a large scale.

5.6 Phases in building LLMs

Model Design
• Architecture Selection: Developers choose an appropriate neural network
architecture, such as transformers, CNNs, or RNNs, depending on the
intended application.
• Depth and Parameters: Decisions about the model's depth and the number
of parameters it will contain are crucial as they define the model's
capabilities and limitations.
23

Dataset Engineering
• Data Collection: Involves gathering data from publicly available sources or
proprietary datasets. The amount and quality of data can significantly
influence the model's performance.
• Data Preparation: Cleansing and structuring of data are critical to ensure
that the model trains on high-quality and relevant information.
• Ethical Considerations: Developers must address key issues such as data
diversity and potential biases within the training data.

Pretraining
• Initial Training: The model is trained on a large corpus of raw data, which
helps in developing a basic understanding of language patterns and
structures.
• Handling Bias: Special attention is needed to avoid training the model on
data that could lead it to generate biased or offensive outputs.

Preliminary Evaluation
• Performance Assessment: Early evaluation of the model to understand its
strengths and areas that require improvement, particularly in how it handles
context and subtlety in language.

Post-training
• Supervised Finetuning: Enhances the model's performance using high-
quality, targeted data.
• Incorporating Feedback: Refining the model further through human
feedback and annotations to improve its accuracy and ethical behavior.

Finetuning
• Optimization: Adjusting the model's weights to optimize for specific tasks,
improving speed and efficiency while potentially sacrificing some general
capabilities.

Final Testing and Evaluation


24

• Comprehensive Review: Rigorous testing to assess the model's response


quality, accuracy, speed, and ethical behavior, ensuring it meets end-user
expectations and standards.

5.7 Prompt engineering vs Fine-tuning vs RAG: Techniques for AI optimization

Prompt Engineering
Definition: Modifying how we interact with the model through specific instructions
or examples without altering the model's underlying architecture or training data.
• Process: Involves crafting and refining verbal prompts to guide the model
towards generating the desired outputs.
• Utility: Allows quick, iterative adjustments to how the model interprets and
responds to queries, ideal for tuning AI behavior with minimal resources.

Retrieval-Augmented Generation (RAG)


• Definition: Enhancing model responses by integrating an external database
that the model can query to pull in additional context or information.
• Implementation: Attaches a searchable database to the model, effectively
expanding its knowledge base without changing its internal structure.
• Advantages: Provides a richer context for model responses, particularly
useful in scenarios requiring detailed or expansive knowledge.

Fine-Tuning
• Definition: Involves retraining the model on new data or adjusting its neural
network weights to improve or specialize its performance.
• Characteristics: This is more resource-intensive and requires additional
data, often leading to substantial improvements in the model's accuracy
and speed for specific tasks.
• Limitations: Unlike prompt engineering or RAG, fine-tuning is not iterative
and can be computationally expensive, necessitating careful planning and
execution.

5.8 The importance of foundation models


25

Specialized Machine Learning Models


• Narrow Focus: Traditionally, machine learning models were designed for
specific tasks such as image recognition, speech transcription, or time
series prediction. Each model excelled within its narrow domain but lacked
versatility.
• Limited Applications: These models were confined to tasks they were
explicitly trained for, such as identifying objects in images or analyzing
sentiment in texts.

Introduction of Large Language Models (LLMs)


• Broader Capabilities: LLMs, trained on extensive and diverse data sets,
exhibit a remarkable ability to perform general-purpose tasks across
different fields.
• Versatility: Initially text-based, LLMs quickly evolved to handle a variety of
data formats, including code, Excel files, PDFs, and even multimedia
content like images and videos.
• Adaptability: With techniques like fine-tuning and prompt engineering,
LLMs can be customized to excel in various applications beyond their initial
training.

Transition to Foundation Models


• Definition: Foundation models are a progression from LLMs, designed to
serve as a versatile base for building applications across multiple
disciplines.
• Characteristics: These models are characterized by their enormous size and
capability to perform myriad tasks, making them a powerful tool for a broad
range of applications.
• Strategic Importance: According to Sam Altman of OpenAI, the future of
developing such models will likely be dominated by Big Tech and
governmental bodies due to the significant resources required.

5.9 Buy vs Make: foundation models vs private models

Traditional Business Strategy


26

Buy vs. Make: Businesses typically decide between outsourcing non-core activities
for efficiency and retaining strategic, value-adding activities internally to maintain
competitive advantage.

Challenges with AI and Foundation Models


• Resource Intensity: Building proprietary foundation models like LLMs is
prohibitively expensive and resource-intensive, limiting this capability to a
few well-funded organizations worldwide.
• Strategic Importance of AI: Despite AI being a core value-add for many
businesses, the high cost and technical demands of developing custom
models mean that most companies cannot afford to build their own from
scratch.

Adapting to AI Realities
• Model-as-a-Service: Companies often turn to providers like OpenAI, which
offer access to advanced models such as GPT through model-as-a-service
arrangements. This allows businesses to leverage cutting-edge AI without
the overhead of developing it.
• Core vs. Non-Core: The contradiction arises when AI is a core strategic
asset, but access to the technology depends heavily on external sources.
This shifts the strategic focus from building AI internally to effectively
integrating and customizing external AI solutions.

Competitive Differentiation
• Skill in AI Adaptation: The real competitive advantage lies in a company's
ability to tailor these external AI resources to specific business needs
through techniques like prompt engineering, RAG (Retrieval-Augmented
Generation), and fine-tuning.
• Demand for AI Expertise: As AI continues to be a central element of
business strategy, there is a growing need for skilled AI engineers who can
navigate these tools to enhance business applications and outputs.

You might also like