KEMBAR78
Understanding Intelligence & AI | PDF | Artificial Intelligence | Intelligence (AI) & Semantics
0% found this document useful (0 votes)
695 views50 pages

Understanding Intelligence & AI

Uploaded by

Keshav Mundra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
695 views50 pages

Understanding Intelligence & AI

Uploaded by

Keshav Mundra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Created by Turbolearn AI

What is Intelligence?
Intelligence is a complex and multifaceted concept that has been studied and debated by
researchers for centuries. According to researchers, intelligence is the ability to perceive or
infer information, and to retain it as knowledge to be applied towards adaptive behaviors
within an environment or context.

Traits of Intelligence
The following are the abilities that are involved in intelligence:

Trait Definition

Ability to regulate, measure, and understand numerical symbols,


Mathematical
abstraction, and logic.
Logical
Ability to reason and think logically.
Reasoning
Language processing skills both in terms of understanding or
Linguistic
implementation in writing or verbally.
Ability to perceive the visual world and the relationship of one object to
Spatial Visual
another.
Kineasthetic Ability that is related to how a person uses his limbs in a skilled manner.
Musical Ability to recognize and create sounds, rhythms, and sound patterns.
Intrapersonal Describes how high the level of self-awareness someone has is.
Additional category of intelligence relating to religious and spiritual
Existential
awareness.
Additional category of intelligence relating to the ability to process
Naturalist
information on the environment around us.
Ability to communicate with others by understanding other people's
Interpersonal
feelings and influence of the person.

"Intelligence is the ability to perceive or infer information, and to retain it as


knowledge to be applied towards adaptive behaviors within an environment or
context."

Definition of Intelligence
We may define intelligence as:

Page 1
Created by Turbolearn AI

"Ability to perceive or infer information, and to retain it as knowledge to be


applied towards adaptive behaviors within an environment or context."

What is Artificial Intelligence?


Artificial Intelligence AI is a term that refers to the development of computer systems that
can perform tasks that typically require human intelligence, such as learning, problem-
solving, and decision-making.

Definition of Artificial Intelligence


"Artificial Intelligence is the development of computer systems that can perform
tasks that typically require human intelligence, such as learning, problem-
solving, and decision-making."

Decision Making
Decision making is a critical aspect of intelligence, and it involves the ability to make choices
based on available information.

How do you make decisions?


Decision making involves several steps, including:

Identifying the problem or opportunity


Gathering information
Evaluating options
Making a choice
Implementing the decision

Make Your Choices!


Decision making is a critical aspect of intelligence, and it involves the ability to make choices
based on available information. By understanding how you make decisions, you can improve
your decision-making skills and become more intelligent.

Applications of Artificial Intelligence around us


Artificial Intelligence is being used in a wide range of applications, including:

Page 2
Created by Turbolearn AI

Virtual assistants
Image recognition
Natural language processing
Robotics
Self-driving cars

What is not AI?


Not all computer systems that can perform tasks are considered AI. For example:

A calculator is not AI because it simply performs mathematical calculations without


any intelligence or decision-making.
A computer program that can play chess is not AI because it simply follows a set of
rules without any intelligence or decision-making.

Conclusion
Intelligence is a complex and multifaceted concept that involves the ability to perceive or
infer information, and to retain it as knowledge to be applied towards adaptive behaviors
within an environment or context. Artificial Intelligence is the development of computer
systems that can perform tasks that typically require human intelligence, such as learning,
problem-solving, and decision-making.## Intelligence and Decision Making

Intelligence is the ability to perceive, understand, and act in the world. It involves reasoning,
planning, and learning.

Decision Making
Decision making is a crucial part of intelligence. It involves making choices based on
available information and experience.

"Decision making depends on the availability of information and how we


experience and understand it."

Factors Affecting Decision Making


Past experience
Intuition
Knowledge
Self-awareness

Scenarios

Page 3
Created by Turbolearn AI

Scenario Description Answer

1 Three doors, one safe Gate 3 lionwouldbedead


2 Poisoned strawberry pie Seema liedaboutbeingonadiet

Artificial Intelligence
Artificial Intelligence AI is the ability of a machine to mimic human traits, such as making
decisions, predicting the future, learning, and improving on its own.

"A machine is artificially intelligent when it can accomplish tasks by itself -


collect data, understand it, analyze it, learn from it, and improve it."

What Makes a Machine Intelligent?


Machines become intelligent through training with data, which enables them to make
decisions and predictions on their own.

Examples of Human Intelligence


Learning alphabets and making words
Learning to walk and upgrading to running and jumping

Examples of Machine Intelligence


Google's search engine
Voice assistants like Alexa and Siri
Gaming experiences enhanced by AI

Applications of Artificial Intelligence


AI is used in various applications, including:

Virtual assistants: Google Assistant, Alexa, Siri


Gaming: Enhanced graphics, new difficulty levels, encouragement
Recommendation systems: Netflix, Amazon, Spotify, YouTube
Health monitoring: Chatbots, health apps
Biometric security: Face locks, fingerprint recognition
Real-time language translation: Google Translate
Weather forecasting: AccuWeather, Weather.com

What is Not AI?

Page 4
Created by Turbolearn AI

Not all technologies are AI. To be considered AI, a machine must be trained with data and
be able to make decisions or predictions on its own.

Technology Description AI or Not?

Fully automatic washing machine Requires human intervention Not AI


Air conditioner with remote control Requires human touch Not AI
Robots that follow a path Need to be primed each time Not AI

Note: IoT InternetofT hings and automation are not the same as AI.## Introduction to
Artificial Intelligence

Artificial Intelligence AI is a vast domain that has been defined in various ways by different
organizations. Here are a few definitions:

NITI Aayog: AI refers to the ability of machines to perform cognitive tasks like
thinking, perceiving, learning, problem solving, and decision making.
World Economic Forum: AI is the software engine that drives the Fourth Industrial
Revolution.
European Artificial Intelligence AI leadership: AI is a cover term for techniques
associated with data analysis and pattern recognition.
Encyclopaedia Britannica: AI is the ability of a digital computer or computer-
controlled robot to perform tasks commonly associated with intelligent beings.

"Artificial Intelligence is a form of Intelligence; a type of technology and a field of


study. AI theory and development of computer systems
bothmachinesandsoftware enables machines to perform tasks that normally
require human intelligence."

AI, Machine Learning, and Deep Learning


Term Definition

Artificial Intelligence
Refers to any technique that enables computers to mimic human
AI intelligence.
A subset of AI that enables machines to improve at tasks with
Machine Learning ML
experience data.
Enables software to train itself to perform tasks with vast amounts
Deep Learning DL
of data.

Note that Machine Learning and Deep Learning are part of Artificial Intelligence, but not
everything that is Machine Learning will be Deep Learning.

Page 5
Created by Turbolearn AI

AI Domains
AI models can be broadly categorized into three domains based on the type of data fed into
them:

Data Sciences: Related to data systems and processes, where the system collects
numerous data, maintains data sets, and derives meaning/sense out of them.
Computer Vision: Depicts the capability of a machine to get and analyze visual
information and predict decisions about it.
Natural Language Processing: Not discussed in this lecture, but an important domain
of AI.

Data Sciences
Data sciences is a domain of AI that involves collecting, maintaining, and deriving meaning
from data. Examples of data science include:

Price comparison websites


e. g. , P riceGrabber, P riceRunner, Junglee, Shopzilla, DealT ime
Decision-making systems that use data to make informed decisions

Computer Vision
Computer Vision is a domain of AI that involves analyzing visual information and predicting
decisions about it. Examples of computer vision include:

Self-driving cars/automatic cars


Face lock in smartphones

KWLH Chart

Page 6
Created by Turbolearn AI

Question Answer

What I Know? ______________________________________________________


What I Want to know? ______________________________________________________
What have I learned? ______________________________________________________
How I learnt this? ______________________________________________________
Natural Language Processing
NLP is a branch of artificial
intelligence that deals with
the interaction between
computers and humans using
natural language.

Natural language refers to language that is spoken and written by people, and
natural language processing NLP attempts to extract information from the
spoken and written word using algorithms.

The ultimate objective of NLP is to read, decipher, understand, and make sense of human
languages in a manner that is valuable.

Examples of Natural Language Processing


Email filters: Email filters are one of the most basic and initial applications of NLP
online. It started with spam filters, uncovering certain words or phrases that signal a
spam message.
Smart assistants: Smart assistants like Apple's Siri and Amazon's Alexa recognize
patterns in speech, then infer meaning and provide a useful response.

AI Ethics
Nowadays, we are moving from the Information era to the Artificial Intelligence era. Now
we do not use data or information, but the intelligence collected from the data to build
solutions. These solutions can even recommend the next TV show or movies you should
watch on Netflix.

Moral Issues: Self-Driving Cars

Page 7
Created by Turbolearn AI

Scenario Description

A self-driving car is faced with a sudden decision: hit a small boy who has come
1 in front of the car or take a sharp right turn to save the boy and smash the car
into a metal pole, damaging the car and injuring the person sitting in it.
The car has hit the boy who came in front of it. Who should be held responsible
2
for the accident?

Possible answers for Scenario 2:

The person who bought the car


The manufacturing company
The developer who developed the car's algorithm
The boy who came in front of the car and got severely injured

Data Privacy
The world of Artificial Intelligence revolves around data. Every company, whether small or
big, is mining data from as many sources as possible. More than 70% of the data collected
till now has been collected in the last 3 years, which shows how important data has
become in recent times.

Question Answer

Where do we collect data Various sources, including smartphones, online activities, and
from? more.
Why do we need to collect To build solutions, provide personalized recommendations, and
data? improve services.

Smartphones and Data Collection


Smartphones have become an integral part of our lives, providing us with a lot of facilities
and features that have made our lives easier. However, they also collect data about us and
our surroundings.

Page 8
Created by Turbolearn AI

Example Description

You discuss buying new shoes with a friend on a mobile network or app, and
1
later receive notifications from online shopping websites recommending shoes.
You search for a trip to Kerala on Google, and later receive messages from apps
2
about packages and deals for the trip.
You discuss a book with someone face-to-face while your phone is nearby, and
3
later receive notifications about similar books or messages about the same book.

Are We Okay with Sharing Our Data?


We need to understand that the data collected by various applications is ethical as
smartphone users agree to it by clicking on "allow" when asked for permission and by
agreeing to all the terms and conditions. However, if one does not want to share their data
with anyone, they can opt for alternative applications that are of similar usage and keep
their data private.

AI Bias
Everyone has a bias of their own, no matter how much one tries to be unbiased. Biases are
not negative all the time. Sometimes, it is required to have a bias to control a situation and
keep things working.

Example Description

Majorly, all virtual assistants have a female voice. Can you think of some reasons
1
for this?
If you search on Google for salons, the first few searches are mostly for female
2
salons. Is this a bias? If yes, then is it a negative bias or a positive one?

Artificial Intelligence AI is a budding technology that not everyone has access to. The
people who can afford AI-enabled devices make the most of it, while others who cannot are
left behind. This creates a gap between these two classes of people, which gets widened
with the rapid advancement of technology.

AI Creates Unemployment
AI is making people's lives easier by automating laborious tasks. However, this may lead to
mass unemployment, where people with little or no skills may be left without jobs. On the
other hand, those who keep up with their skills will flourish.

Page 9
Created by Turbolearn AI

Key Questions:

Should AI replace laborious jobs?


Is there an alternative to major unemployment?
Should AI not replace laborious jobs? Will the lives of people improve if they keep on
being unskilled?

AI for Kids
Kids nowadays are smart enough to understand technology from a very early age.
However, should technology be given to children so young?

The Concern:

"While it is good that the boy knows how to use technology effectively, on the
other hand, he uses it to complete his homework without really learning
anything since he is not applying his brain to solve the Math problems."

Is it Ethical?

Is it ethical to let the boy use technology to help in this manner?

AI Project Cycle
The AI Project Cycle provides a framework for developing AI projects. It consists of 5
stages:

Stage Description

Problem Scoping Identify the problem to be solved


Data Acquisition Collect data from various reliable and authentic sources
Data Exploration Visualize the data to identify patterns
Modeling Decide on the type of model to build and test it
Evaluation Test the model on new data and evaluate its performance

Problem Scoping
Problem Scoping is about identifying a problem and having a vision to solve it. It involves
understanding the problem and its parameters.

4Ws Problem Canvas

The 4Ws Problem Canvas helps in identifying the key elements related to the problem.

Page 10
Created by Turbolearn AI

W Description

Who Analyze the people getting affected directly or indirectly due to the problem
What Determine the nature of the problem
Where Identify the location where the problem exists
Why Understand the reasons behind the problem

Sustainable Development Goals SDGs

The United Nations has announced 17 SDGs, which are a set of goals to be achieved by
2030. These goals correspond to problems that we might observe around us.

Goal Description

1 No Poverty
2 Zero Hunger
3 Good Health and Well-being
... ...
17 Partnerships for the Goals

Stakeholders

Stakeholders are the people who face the problem and would be benefitted with the
solution.## Problem Definition

The problem definition stage involves identifying and understanding the problem you want
to solve. This stage is crucial in the AI project cycle as it sets the foundation for the entire
project.

The 4Ws Problem Canvas


The 4Ws problem canvas is a tool used to define and understand the problem. It consists of
four sections:

Who: Identify the stakeholders associated with the problem.


What: Define the problem or issue.
Where: Identify the context or situation in which the problem arises.
Why: Understand the benefits of solving the problem.

Problem Statement Template

Page 11
Created by Turbolearn AI

The problem statement template is used to summarize the key points of the problem. It
consists of the following format:

Stakeholder(s)

Who has/have a

issue, problem, need

What problem that when/while

context, situation

Where. An ideal

benefitofsolutionforthem

Why solution would.

Data Acquisition
Data acquisition is the process of collecting data for the project. Data can be a piece of
information or facts and statistics collected together for reference or analysis.

Types of Data

Type of Data Description

Training Data Data used to train the AI model.


Testing Data Data used to test the AI model.

Characteristics of Good Data


Relevant: The data should be relevant to the problem statement.
Authentic: The data should be accurate and trustworthy.

Data Features
Data features refer to the type of data you want to collect. For example, in a project to
predict employee salaries, data features might include:

Page 12
Created by Turbolearn AI

Salary amount
Increment percentage
Increment period
Bonus

Data Collection Methods


Surveys
Web Scraping
Sensors
APIs
Cameras
Observations

Reliable Sources of Data


Open-sourced government websites e. g. data. gov. in, india. gov. in

Data Exploration
Data exploration is the process of analyzing and understanding the data. This stage
involves visualizing the data to identify trends, relationships, and patterns.

Types of Visual Representations


Bar Graphs
Visual Representations

Goals of Data Exploration


Quickly get a sense of the trends, relationships, and patterns contained within the
data.
Define a strategy for which model to use at a later stage.
Communicate the findings to others effectively.

Modelling
Modelling is the process of developing an AI model to solve the problem. There are two
main approaches to modelling:

Page 13
Created by Turbolearn AI

Rule-Based Approach
A rule-based approach refers to the AI modelling where the rules are defined by
the developer. The machine follows the rules or instructions mentioned by the
developer and performs its task accordingly.

Example: A dataset that tells us about the conditions on the basis of which we can
decide if an elephant may be spotted or not while on safari.
Drawback: The learning is static, and the machine does not adapt to changes in the
data.

Learning-Based Approach
A learning-based approach refers to the AI modelling where the machine learns
by itself. Under the learning-based approach, the AI model gets trained on the
data fed to it and then is able to design a model which is adaptive to the change
in data.

Example: A dataset that is used to train a model to predict employee salaries.


Advantage: The model adapts to changes in the data and can handle exceptions.

Types of AI Models

Type of Model Description

Machine Learning A type of AI model that learns from data.


Deep Learning A type of AI model that uses neural networks to learn from data.
Rule-Based A type of AI model that follows predefined rules.

The machine learning approach introduces dynamicity in the model by allowing it to adapt
to new data. This approach can be divided into three parts:

Supervised Learning
Definition: A supervised learning model is trained on a labelled dataset, where
the data is already known to the person training the model.

In a supervised learning model, the dataset is labelled, and the model is trained to predict
the label of new data. There are two types of supervised learning models:

Page 14
Created by Turbolearn AI

Classification: Where the data is classified according to the labels. For example, in a
grading system, students are classified on the basis of their grades.
Regression: Where the model works on continuous data. For example, predicting the
next salary based on previous salaries and increments.

Unsupervised Learning
Definition: An unsupervised learning model works on unlabelled data, where
the data is random and unknown to the person training the model.

Unsupervised learning models are used to identify relationships, patterns, and trends in the
data. They can be further divided into two categories:

Clustering: Refers to the unsupervised learning algorithm that can cluster unknown
data according to patterns or trends identified.
Dimensionality Reduction: Reduces the dimensions of the data to make it easier to
visualize and understand.
Dimensionality Reduction Description

Reduces dimensions Makes data easier to visualize and understand


Loses information As dimensions are reduced, information is lost

Evaluation
Once a model is trained, it needs to be evaluated to calculate its efficiency and performance.
The model is tested with testing data, and its efficiency is calculated based on the following
parameters:

Accuracy
Precision
Recall
F1 Score

Neural Networks
Neural networks are loosely modelled after how neurons in the human brain behave. They
are able to extract data features automatically without needing the input of the
programmer.

How Neural Networks Work

Page 15
Created by Turbolearn AI

A neural network is divided into multiple layers, each with several blocks called nodes. Each
node has its own task to accomplish, which is then passed to the next layer.

Input Layer: Acquires data and feeds it to the neural network. No processing occurs at
this layer.
Hidden Layers: Where the whole processing occurs. Each node has its own machine
learning algorithm that it executes on the data received from the input layer.
Output Layer: Gives the final output to the user. No processing occurs at this layer.

Features of Neural Networks


Automatic feature extraction
Fast and efficient
Can handle large datasets

Recap: Jupyter Notebook

What is Jupyter Notebook?


Definition: Jupyter Notebook is an open-source web application that can be
used to create and share documents that contain live code, equations,
visualizations, and text.

How to Access Jupyter Notebook


Install Anaconda: The most widely used Python distribution for data science.
Access through Anaconda Navigator: Scroll around all the applications that come
along with Anaconda.## Introduction to Virtual Environments

A virtual environment is a tool that helps to keep dependencies required by different


projects separated, by creating isolated Python virtual environments for them.

A virtual environment is a self-contained Python environment that allows you to


manage dependencies for a specific project without affecting the global Python
environment.

Imagine a scenario where we are working on two Python-based projects and one of them
works on Python 2.7 and the other uses Python 3.7. In such situations, virtual environments
can be really useful to maintain dependencies of both the projects as the virtual
environments will make sure that these dependencies are not conflicting with each other
and no impact reaches the base environment at any point in time.

Page 16
Created by Turbolearn AI

Creating Virtual Environments with Anaconda


Creating virtual environments is an easy task with Anaconda distribution. Here are the
steps to create one:

1. Open Anaconda Prompt.


2. As we open the Anaconda prompt, we can see that in the beginning of the prompt
message, the term base is written. This is the default environment in which the
Anaconda works.
3. Let us now create a virtual environment named env. To create the environment, write:

conda create -n env python=3.7

This code will create an environment named env and will install Python 3.7 and other basic
packages into it.

4. After some processing, the prompt will ask if we wish to proceed with installations or
not. Type Y on it and press Enter. Once we press Enter, the packages will start getting
installed in the environment.

5. Depending upon the internet speed, the downloading of packages might take varied
time.

6. Once all the packages are downloaded and installed, we will get a message like this:

7. This shows that our environment called env has been successfully created. Once an
environment has been successfully created, we can access it by writing:

conda activate env

This would activate the virtual environment and we can see the term written in brackets has
changed from base to env. Now our virtual environment is ready to be used.

Installing Jupyter Notebook Dependencies


But, to open and work with Jupyter Notebooks in this environment, we need to install the
packages which help in working with Jupyter Notebook. These packages get installed by
default in the base environment when Anaconda gets installed.

Page 17
Created by Turbolearn AI

To install Jupyter Notebook dependencies, we need to activate our virtual environment env
and write:

conda install ipykernel nb_conda jupyter

It will again ask if we wish to proceed with the installations, type Y to begin the
installations. Once the installations are complete, we can start working with Jupyter
notebooks in this environment.

Introduction to Python
Python is a programming language which was created by Guido Van Rossum in Centrum
Wiskunde & Informatica. The language was publicly released in 1991 and it got its name
from a BBC comedy series from 1970s Monty Python's Flying Circus.

Why Python?
Python is a popular language for developing applications of Artificial Intelligence. Here are
some reasons why Python gains maximum popularity:

Easy to learn, read and maintain: Python has few keywords, simple structure and a
clearly defined syntax.
A Broad Standard library: Python has a huge bunch of libraries with plenty of built-in
functions to solve a variety of problems.
Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portability and Compatibility: Python can run on a wide variety of operating systems
and hardware platforms, and has the same interface on all platforms.
Extendable: We can add low-level modules to the Python interpreter.
Databases and Scalable: Python provides interfaces to all major open source and
commercial databases along with a better structure and support for much larger
programs than shell scripting.

Applications of Python
There exist a wide variety of applications when it comes to Python. Some of the
applications are:

Page 18
Created by Turbolearn AI

Web Development: Python can be used to build web applications using popular
frameworks like Django and Flask.
Data Analysis: Python has popular libraries like Pandas and NumPy for data analysis.
Machine Learning: Python has popular libraries like scikit-learn and TensorFlow for
machine learning.
Automation: Python can be used to automate tasks using scripts.

Python Basics

Printing Statements
We can use Python to display outputs for any code we write. To print any statement, we
use the print function in Python.

Python Statements and Comments


Instructions written in the source code to execute are known as statements. These are the
lines of code which we write for the computer to work upon.

Statement Description

This is a Python statement as the computer would go through it and do the


print(5+10) needful
whichinthiscasewouldbetocalculate5 + 10andprintitontheoutputscreen

On the other hand, there exist some statements which do not get executed by the
computer. These lines of code are skipped by the machine. They are known as comments.

Comment Description

# This is a comment and will not be read by the This is a comment as it starts with
machine. #.

Keywords & Identifiers


In Python, there exist some words which are pre-defined and carry a specific meaning for
the machine by default. These words are known as keywords.

Page 19
Created by Turbolearn AI

Keyword Description

if Used for conditional statements


else Used for conditional statements
for Used for loops
while Used for loops

An identifier is any word which is variable. Identifiers can be declared by the user as per
their convenience of use and can vary according to the way the user wants.

Identifier Description

count A variable to store a count


interest A variable to store interest
x A variable to store a value
ai_learning A variable to store a value related to AI learning
Test A variable to store a value

Variables & Datatypes


A variable is a named location used to store data in the memory.

Variable Datatype Description

X = 10 Integer A variable to store a numerical value


Letters = "XYZ" String A variable to store alphabetic data
number = 13.95 Float A variable to store a decimal value
word = "k" String A variable to store a character

The type of data is defined by the term datatype in Python. There can be various types of
data which are used in Python programming.

Datatype Description

Integer A whole number


Float A decimal number
String A sequence of characters
Boolean A true or false value

Python Inputs

Page 20
Created by Turbolearn AI

In Python, not only can we display the output to the user, but we can also take input from
the user.## User Input and Operators

Collecting User Input


To collect data from the user at the time of execution, the input() function is used. While
using the input function, the data type of the expected input is required to be mentioned so
that the machine does not interpret the received data in an incorrect manner.

The data taken as input from the user is considered to be a string


sequenceofcharacters by default.

Data Type Conversion


To convert the input string to a specific data type, the following syntax is used:

Data Type Syntax

String Str = input(<String>)


Integer Number = int(input(<string>))
Decimal Value = float(input(<String>))

Operators
Operators are special symbols that represent computation. They are applied on operands,
which can be values or variables.

Arithmetic Operators

Operator Meaning Expression Result

+ Addition 10 + 20 30
- Subtraction 30 - 10 20
* Multiplication 30 * 100 300
/ Division 30 / 10 20.0
// Integer Division 25 // 10 2
% Remainder 25 % 10 5
** Raised to power 3 ** 2 9

Page 21
Created by Turbolearn AI

Conditional Operators

Operator Meaning Expression Result

> Greater Than 20 > 10 True


< Less Than 15 < 25 True
== Equal To 5 == 5 True
!= Not Equal to 5 != 6 True
>= Greater than or Equal to 45 >= 45 True
<= Less than or equal to 13 <= 24 True

Logical Operators

Operator Meaning Expression Result

and And operator True and True True


or Or operator True or False True
not Not Operator not False True

Assignment Operators

Operator Expression Equivalent to

= X = 5 X = 5
+= X += 5 X = X + 5
-= X -= 5 X = X - 5
*= X *= 5 X = X * 5
/= X /= 5 X = X / 5

Conditional Statements
Conditional statements help the machine in taking a decision according to the condition that
gets fulfilled.

If statement: used to execute a block of code if a certain condition is true.


If-else statement: used to execute a block of code if a certain condition is true, and
another block of code if the condition is false.
If-else ladder: used to execute a block of code if a certain condition is true, and
another block of code if the condition is false, and another block of code if the
condition is false, and so on.

Looping

Page 22
Created by Turbolearn AI

Looping mechanisms are used to iterate statements or a group of statements as many times
as it is asked for.

While Loop: used to execute a block of code as long as a certain condition is true.
For Loop: used to execute a block of code for a specified number of times.
Do-While Loop: used to execute a block of code at least once, and then repeat the
execution as long as a certain condition is true.

Python Packages
A package is a space where we can find codes or functions or modules of similar type.

Installing Packages
To install a package in Python, we need to use the conda install command.

Open Anaconda Navigator and activate your working environment.


Type conda install <package_name> to install a package.
Type conda install <package_name1> <package_name2> <package_name3> to install
multiple packages at once.

Importing Packages
To use a package in Python, we need to import it.

import <package_name>: imports the entire package.


import <package_name> as <alias>: imports the entire package and assigns an alias to
it.
from <package_name> import <function>: imports a specific function from the package.
from <package_name> import <function> as <alias>: imports a specific function from
the package and assigns an alias to it.

Data Sciences
Data Science is a concept to unify statistics, data analysis, machine learning, and their
related methods in order to understand and analyze actual phenomena with data.

Applications of Data Sciences

Page 23
Created by Turbolearn AI

Fraud and Risk Detection: used in finance to detect fraudulent transactions and
predict risk.
Genetics and Genomics: used in medicine to understand the impact of DNA on health
and develop personalized treatments.

Types of Data Sciences


Data Sciences: working around numeric and alpha-numeric data.
Computer Vision: working around image and visual data.
Natural Language Processing: working around textual and speech-based data.##
Data Science Applications

Data science has numerous applications in various fields, including:

Genomic Data: Techniques allow integration of different kinds of data with genomic
data in disease research, providing a deeper understanding of genetic issues in
reactions to particular drugs and diseases.
Internet Search: Search engines like Google, Yahoo, and Bing use data science
algorithms to deliver the best results for searched queries in a fraction of a second.
Targeted Advertising: Digital marketing uses data science algorithms to target ads
based on users' past behavior, resulting in higher click-through rates CT Rs than
traditional advertisements.
Website Recommendations: Companies like Amazon, Twitter, Google Play, Netflix,
LinkedIn, and IMDB use data science to improve user experience by recommending
similar products based on previous search results.
Airline Route Planning: Airline companies use data science to identify strategic areas
of improvement, predict flight delays, decide which class of airplanes to buy, and
effectively drive customer loyalty programs.

Getting Started with Data Science


Data science is a combination of Python and mathematical concepts like statistics, data
analysis, and probability. It provides a strong base for data analysis in Python and can be
used in developing applications around artificial intelligence AI .

Revisiting the AI Project Cycle


The AI project cycle framework involves:

Page 24
Created by Turbolearn AI

Problem Scoping: Identifying the problem and understanding its factors


Data Acquisition: Collecting data relevant to the problem
Data Exploration: Analyzing the collected data to understand its requirements
Model Development: Creating a model to solve the problem
Model Deployment: Deploying the model in a real-world scenario

The Scenario: Food Waste in Restaurants


Restaurants often prepare food in bulk, but a significant amount of food is left unconsumed,
resulting in losses. The goal is to predict the quantity of food to be prepared for everyday
consumption in restaurant buffets.

Problem Scoping
4Ws Problem Canvas Description

Who Restaurant owners and chefs


What Food waste due to improper estimation of food quantity
Where Restaurants that serve buffet food
Why To reduce food waste and losses

Problem Statement
"Our restaurant owners have a problem of losses due to food wastage. The food is left
unconsumed due to improper estimation. We want to be able to predict the amount of food
to be prepared for every day's consumption."

Data Acquisition
The following data features affect the problem:

Data Feature Description

Quantity of dish prepared per day The amount of food prepared for each dish
Total number of customers per day The number of customers visiting the restaurant
Unconsumed quantity of dish per day The amount of food left unconsumed
Price of dish The price of each dish
Fixed customers per day The number of regular customers

System Map

Page 25
Created by Turbolearn AI

The system map shows the relationship between each element and the project's goal.
Positive arrows indicate a direct relationship, while negative arrows indicate an inverse
relationship.

Element Relationship with Goal

Quantity of dish prepared per day Direct


Total number of customers per day Direct
Unconsumed quantity of dish per day Inverse
Price of dish Direct
Fixed customers per day Direct

Data Exploration
The collected data is analyzed to understand its requirements. The goal is to predict the
quantity of food to be prepared for the next day's consumption.## Modelling In this section,
we will discuss the process of modelling our dataset using a regression model.

A regression model is a type of supervised learning model that takes in continuous values
of data over a period of time. Since our dataset consists of 30 days of continuous data, we
can use a regression model to predict the next values.

Training the Model


The dataset is divided into a ratio of 2:1 for training and testing respectively. The model is
first trained on the 20-day data and then evaluated for the rest of the 10 days.

Evaluation
Once the model has been trained, it is time to see if the model is working properly or not.
The evaluation process involves the following steps:

Page 26
Created by Turbolearn AI

Step 1: The trained model is fed data regarding the name of the dish and the quantity
produced for the same.
Step 2: It is then fed data regarding the quantity of food left unconsumed for the same
dish on previous occasions.
Step 3: The model then works upon the entries according to the training it got at the
modelling stage.
Step 4: The model predicts the quantity of food to be prepared for the next day.
Step 5: The prediction is compared to the testing dataset value.
Step 6: The model is tested for 10 testing datasets kept aside while training.
Step 7: Prediction values of testing dataset is compared to the actual values.
Step 8: If the prediction value is same or almost similar to the actual values, the model
is said to be accurate.

Data Collection
Data collection is the process of gathering data from various sources. It has been a part of
our society since ages, even when people did not have fair knowledge of calculations.

"Data collection is an exercise which does not require even a tiny bit of
technological knowledge. But when it comes to analysing the data, it becomes a
tedious process for humans as it is all about numbers and alpha-numerical
data."

Sources of Data
There exist various sources of data from where we can collect any type of data required.
The data collection process can be categorised in two ways: Offline and Online.

Offline Data Collection Online Data Collection

Sensors Open-sourced Government Portals


Surveys Reliable Websites Kaggle
Interviews World Organisations open-sourced statistical websites
Observations

Types of Data
For Data Science, usually the data is collected in the form of tables. These tabular datasets
can be stored in different formats.

Page 27
Created by Turbolearn AI

Format Description

CSV Comma Separated Values. A simple file format used to store tabular data.
A piece of paper or a computer program used for accounting and recording
Spreadsheet
data using rows and columns.
Structured Query Language. A programming language used for managing
SQL
data held in different kinds of DBMS DatabaseManagementSystem.

Data Access
After collecting the data, to be able to use it for programming purposes, we should know
how to access the same in a Python code.

NumPy
NumPy is the fundamental package for Mathematical and logical operations on arrays in
Python.

"NumPy gives a wide range of arithmetic operations around numbers giving us


an easier approach in working with them."

NumPy Arrays Lists

Homogenous collection of Data. Heterogenous collection of Data.


Can contain only one type of data. Can contain multiple types of data.
Can be directly initialized as it is a part of Python
Cannot be directly initialized.
syntax.
Direct numerical operations can be
Direct numerical operations are not possible.
done.

Pandas
Pandas is a software library written for the Python programming language for data
manipulation and analysis.

"Pandas offers data structures and operations for efficiently handling structured
data, including tabular data such as spreadsheets and SQL tables."

Page 28
Created by Turbolearn AI

import numpy
A = numpy.array([1,2,3,4,5,6,7,8,9,0])
A = [1,2,3,4,5,6,7,8,9,0]
```## Data Structures and Operations

Pandas is a library that provides data structures and operations for manipulatin

### What is Pandas?

> Pandas is a library that provides data structures and operations for manipulat

Pandas is well suited for many different kinds of data, including:

* Tabular data with heterogeneously-typed columns, as in an SQL table or Excel s


* Ordered and unordered (not necessarily fixed-frequency) time series data
* Arbitrary matrix data (homogeneously typed or heterogeneous) with row and colu
* Any other form of observational / statistical data sets

### Primary Data Structures of Pandas

The two primary data structures of Pandas are:

* **Series** (1-dimensional)
* **DataFrame** (2-dimensional)

These data structures handle the vast majority of typical use cases in finance,

### Features of Pandas

Pandas provides several features that make it a powerful tool for data manipulat

* Easy handling of missing data (represented as NaN) in floating point as well a


* Size mutability: columns can be inserted and deleted from DataFrame and higher
* Automatic and explicit data alignment: objects can be explicitly aligned to a
* Intelligent label-based slicing, fancy indexing, and subsetting of large data
* Intuitive merging and joining data sets
* Flexible reshaping and pivoting of data sets

## Data Visualization with Matplotlib

Matplotlib is a multi-platform data visualization library built on NumPy arrays.

### Types of Plots

Some types of plots that can be made with Matplotlib include:

* Scatter plots
* Bar charts
* Histograms
* Box plots

### Features of Matplotlib

Matplotlib provides several features that make it a powerful tool for data visua

* Customizable plots: plots can be stylized and made more descriptive and commun
* Variety of plot types: Matplotlib provides a wide range of plot types that can

## Basic Statistics with Python

Page 29
Created by Turbolearn AI

Python provides several libraries that can be used for basic statistical analysi

### Statistical Methods

Some common statistical methods used in data analysis include:

| Method | Description |
| --- | --- |
| **Mean** | The average value of a dataset |
| **Median** | The middle value of a dataset when it is sorted in ascending orde
| **Mode** | The most frequently occurring value in a dataset |
| **Standard Deviation** | A measure of the spread or dispersion of a dataset |
| **Variance** | A measure of the spread or dispersion of a dataset |

### Calculating Statistical Methods

Python provides several libraries that can be used to calculate statistical meth

## Data Visualization with Python

Python provides several libraries that can be used for data visualization, inclu

### Types of Plots

Some types of plots that can be made with Python include:

* Scatter plots
* Bar charts
* Histograms
* Box plots

### Features of Data Visualization Libraries

Python's data visualization libraries provide several features that make them po

* Customizable plots: plots can be stylized and made more descriptive and commun
* Variety of plot types: Python's data visualization libraries provide a wide ra

## Data Issues

Data can have several issues that need to be addressed before it can be analyzed

* **Erroneous Data**: incorrect values in the dataset


* **Missing Data**: missing values in the dataset
* **Outliers**: values that are significantly different from the rest of the dat

### Handling Data Issues

Python provides several libraries that can be used to handle data issues, includ

* Handling missing data: Pandas provides several features that can be used to ha
* Handling outliers: Python's data visualization libraries provide several featu

### Box Plots

A box plot is a graphical representation of a dataset that displays the distribu

* **Quartile 1**: From 0th percentile to 25th percentile


* **Quartile 2**: From 25th percentile to 50th percentile (also known as the med
* **Quartile 3**: From 50th percentile to 75th percentile
* **Quartile 4**: From 75th percentile to 100th percentile

Page 30
Created by Turbolearn AI

| Quartile | Description |
| --- | --- |
| Q1 | 0th - 25th percentile |
| Q2 | 25th - 50th percentile (median) |
| Q3 | 50th - 75th percentile |
| Q4 | 75th - 100th percentile |

### Interquartile Range (IQR)

> The interquartile range (IQR) is the difference between the 75th percentile (Q

### Outliers

> Outliers are data points that are significantly different from the other data

## Personality Prediction

### K-Nearest Neighbour (KNN) Algorithm

The KNN algorithm is a supervised machine learning algorithm that can be used fo

> The KNN algorithm relies on the surrounding points or neighbours to determine

### Features of KNN

* The KNN prediction model relies on the surrounding points or neighbours to det
* It utilises the properties of the majority of the nearest points to decide how
* It is based on the concept that similar data points should be close to each ot

### Example: Predicting the Sweetness of a Fruit

| K Value | Prediction |
| --- | --- |
| 1 | Not sweet (based on the nearest neighbour) |
| 2 | No prediction (due to conflicting neighbours) |
| 3 | Sweet (based on the majority of neighbours) |

### Significance of the Number of Neighbours

* Decreasing the value of K to 1 makes predictions less stable.


* Increasing the value of K makes predictions more stable due to majority voting
* In cases where a majority vote is taken, K is usually made an odd number to ha

## Computer Vision

### Introduction

> Computer vision is a field of study that enables computers to interpret and un

**Computer Vision** is a technique that enables computers to mimic human intelli

## Emoji Scavenger Hunt: A Practical Experience

The Emoji Scavenger Hunt game is an interactive way to experience the capabiliti

### Strategy to Win the Game

* Players must analyze the items and their surroundings to identify the correct
* The computer uses algorithms and methods to process and analyze the visual dat

### Limitations of Computer Vision

Page 31
Created by Turbolearn AI

* Lighting conditions can affect the accuracy of object identification.


* The computer may not always be able to identify all items correctly.

## Applications of Computer Vision

| Application | Description |
| --- | --- |
| **Facial Recognition** | Used in security systems, attendance tracking, and sm
| **Face Filters** | Used in social media apps like Instagram and Snapchat to ap
| **Google Search by Image** | Allows users to search for images and get results
| **Computer Vision in Retail** | Used to track customer movement, analyze navig
| **Self-Driving Cars** | Uses Computer Vision to identify objects, navigate rou
| **Medical Imaging** | Assists doctors in interpreting medical images and creat
| **Google Translate App** | Uses Computer Vision to translate text in real-time

## Computer Vision Tasks

| Task | Description |
| --- | --- |
| **Image Classification** | Assigns a label to an input image from a fixed set
| **Classification + Localisation** | Identifies the object and its location in
| **Object Detection** | Finds instances of real-world objects in images or vide
| **Instance Segmentation** | Detects instances of objects, assigns a category,

## Basics of Images

### What is a Pixel?

> A pixel is a picture element, the smallest unit of information that makes up a

### Resolution

| Term | Description |
| --- | --- |
| **Pixel Count** | The number of pixels in an image, expressed as width x heigh
| **Megapixel** | A unit of measurement for pixel count, equal to one million pi

### Pixel Value

> Each pixel has a pixel value that describes its brightness and/or color, typic

### Grayscale Images

A **grayscale image** is an image that has a range of shades of gray without app

> "A grayscale image has each pixel of size 1 byte having a single plane of 2D a

The size of a grayscale image is defined as the Height x Width of that image.

### RGB Images

An **RGB image** is an image that is made up of three primary colors: Red, Green

> "Every RGB image is stored in the form of three different channels called the

Each plane separately has a number of pixels with each pixel value varying from

| Channel | Description |
| --- | --- |
| R (Red) | Values range from 0 to 255 |
| G (Green) | Values range from 0 to 255 |
| B (Blue) | Values range from 0 to 255 |

Page 32
Created by Turbolearn AI

## Image Features

In computer vision and image processing, a **feature** is a piece of information

### Types of Features

* **Points**: Specific points in the image that can be used as features.


* **Edges**: Boundaries between different regions in the image that can be used
* **Objects**: Specific objects in the image that can be used as features.

### Good Features

* **Corners**: Corners are considered to be good features in an image because th


* **Edges**: Edges are also considered to be good features in an image because t

## Convolution

**Convolution** is a process of changing pixel values in an image. This process

> "The convolution operation is commonly used to create effects such as filters

## Introduction to OpenCV

**OpenCV** (Open Source Computer Vision Library) is a tool that helps a computer

### Installing OpenCV

To install OpenCV library, open anaconda prompt and write the following command:
```python
pip install opencv-python

Basic Image Processing Operations


OpenCV can be used for basic image processing operations such as:

Resizing
Cropping
And many more

To learn more about OpenCV, head to Jupyter Notebook for introduction to OpenCV:
http://bit.ly/cv_notebook## Convolution Operator

What is Convolution?
Convolution is a simple mathematical operation that is fundamental to many common
image processing operators. It provides a way of "multiplying together" two arrays of
numbers, generally of different sizes, but of the same dimensionality, to produce a third
array of numbers of the same dimensionality.

Page 33
Created by Turbolearn AI

How Does Convolution Work?


Convolution is an element-wise multiplication of an image array and a kernel array,
followed by a sum. The kernel is passed over the whole image to get the resulting array
after convolution.

Kernel: A kernel is a matrix that is slid across the image and multiplied with the input such
that the output is enhanced in a certain desirable manner. Each kernel has a different value
for different kinds of effects that we want to apply to an image.

Convolution Operation
The convolution operation can be represented as:

I = Image Array K = Kernel Array I * K = Resulting array after performing the convolution
operator

Edge Extension
To achieve an output image of the same size as the input image, we need to extend the
edge values out by one in the original image while overlapping the centers and performing
the convolution. This will help us keep the input and output image of the same size.

Convolution Neural Networks CNN

What is a Convolutional Neural Network?


A Convolutional Neural Network CNN is a Deep Learning algorithm that can take in an
input image, assign importance learnableweightsandbiases to various aspects/objects in the
image, and be able to differentiate one from the other.

Layers of a Convolutional Neural Network


A Convolutional Neural Network consists of the following layers:

Page 34
Created by Turbolearn AI

Convolution Layer: The first layer of a CNN, responsible for extracting high-level
features such as edges from the input image.
Rectified Linear Unit ReLU Layer: The next layer in the CNN, responsible for
introducing non-linearity in the feature map.
Pooling Layer: Responsible for reducing the spatial size of the convolved feature
while still retaining the important features.
Fully Connected Layer: The final layer of the CNN, responsible for making predictions
based on the features extracted by the previous layers.

Convolution Layer

Layer Description

Convolution Layer Extracts high-level features such as edges from the input image
ReLU Layer Introduces non-linearity in the feature map
Reduces the spatial size of the convolved feature while still retaining
Pooling Layer
the important features
Fully Connected Makes predictions based on the features extracted by the previous
Layer layers

Rectified Linear Unit ReLU Function


"The ReLU function is a non-linear function that maps all negative values to 0
and all positive values to the same value."

The ReLU function is used to introduce non-linearity in the feature map, making the color
change more obvious and more abrupt.

Pooling Layer
There are two types of pooling that can be performed on an image:

Max Pooling: Returns the maximum value from the portion of the image covered by
the kernel.
Average Pooling: Returns the average value from the portion of the image covered by
the kernel.

Example of Convolution Operation

Page 35
Created by Turbolearn AI

Image Array Kernel Array Resulting Array

150 0 255 101 150 0 255


100 179 25 010 100 179 25
155 146 13 101 155 146 13

Note: The resulting array is obtained by performing the convolution operation on the image
array and the kernel array.## Convolutional Neural Networks CNNs

Pooling Layer
The pooling layer is an important layer in the CNN as it performs a series of tasks:

Makes the image smaller and more manageable


Makes the image more resistant to small transformations, distortions, and translations
in the input image
A small difference in the input image will create a very similar pooled image

Fully Connected Layer


The final layer in the CNN is the Fully Connected Layer F CP . The objective of a fully
connected layer is to take the results of the convolution/pooling process and use them to
classify the image into a label inasimpleclassificationexample.

Fully Connected Layer: A layer in a neural network where every input is


connected to every output by a learnable weight.

The output of convolution/pooling is flattened into a single vector of values, each


representing a probability that a certain feature belongs to a label. For example, if the
image is of a cat, features representing things like whiskers or fur should have high
probabilities for the label cat.

Natural Language Processing NLP

Introduction
Natural Language Processing NLP is the sub-field of AI that is focused on enabling
computers to understand and process human languages.

Natural Language Processing NLP : A subfield of artificial intelligence that


deals with the interaction between computers and human language.

Page 36
Created by Turbolearn AI

NLP is concerned with the interactions between computers and human natural languages,
in particular how to program computers to process and analyze large amounts of natural
language data.

Applications of NLP

Application Description

Automatic Summarizing the meaning of documents and information to provide an


Summarization overview of a news item or blog post.
Identifying sentiment among several posts or even in the same post
Sentiment Analysis
where emotion is not always explicitly expressed.
Assigning predefined categories to a document to organize it and help
Text Classification
find the information needed.
Using speech recognition to detect and make sense of human speech,
Virtual Assistants and perform tasks such as keeping notes, making calls, and sending
messages.

Getting Started with NLP


NLP is all about how machines try to understand and interpret human language and
operate accordingly.

Revisiting the AI Project Cycle


To develop a project in NLP, we need to follow the AI project cycle.

Problem Scoping

Canvas Description

Who People who suffer from stress and are at the onset of depression.
People who need help are reluctant to consult a psychiatrist and hence live
What
miserably.
Where When they are going through a stressful period of time.
People get a platform where they can talk and vent out their feelings
Why
anonymously.

Page 37
Created by Turbolearn AI

Problem Statement
Our people undergoing stress have a problem of not being able to share their feelings while
they need to vent out their stress.## Chatbots and Natural Language Processing

Goal of the Project


The goal of the project is to create a chatbot that can interact with people, help them vent
out their feelings, and take them through primitive Cognitive Behavioral Therapy CBT .

Data Acquisition
To understand the sentiments of people, we need to collect their conversational data. This
data can be collected from various means:

Surveys
Observing therapists' sessions
Databases available on the internet
Interviews

Data Exploration
Once the textual data has been collected, it needs to be processed and cleaned so that an
easier version can be sent to the machine. This process is called Data Normalization.

"Data Normalization is the process of transforming raw data into a more


structured and organized format, making it easier to analyze and process."

Modelling
Once the text has been normalized, it is then fed to an NLP NaturalLanguageP rocessing
based AI model. NLP is a subfield of artificial intelligence that deals with the interaction
between computers and humans in natural language.

Evaluation
The model trained is then evaluated and the accuracy for the same is generated on the
basis of the relevance of the answers which the machine gives to the users' responses.

Page 38
Created by Turbolearn AI

Model
Description
Performance

The model's output does not match the true function, resulting in lower
Underfitting
accuracy.
The model's performance matches well with the true function, resulting in
Perfect Fit
optimum accuracy.
The model's performance is trying to cover all the data samples, even if
Overfitting
they are out of alignment to the true function, resulting in lower accuracy.

Chatbots
A chatbot is a computer program that uses NLP to simulate conversation with human users.
There are two types of chatbots:

Chatbot
Description
Type

A chatbot that works around a script which is programmed in them. They are
Script-bot
easy to make and mostly free, but have limited functionality.
A chatbot that works on bigger databases and other resources directly. They
Smart-bot
are flexible and powerful, but require coding to take them on board.

Human Language vs Computer Language


Humans communicate through language, which is processed by the brain. Computers, on
the other hand, understand the language of numbers.

"Human language is complex and has multiple characteristics that might be easy
for a human to understand but extremely difficult for a computer to understand."

Challenges in Processing Natural Language


There are several challenges in processing natural language, including:

Arrangement of words and meaning: Human language has rules and structure,
which can be difficult for computers to understand.
Multiple meanings of a word: Words can have different meanings depending on the
context, which can be difficult for computers to understand.

Page 39
Created by Turbolearn AI

Part-of-Speech Tagging
Part-of-speech tagging is a technique used to identify the different parts of a speech, such
as nouns, verbs, adverbs, and adjectives.

Analogy with Programming Language


Programming languages have different syntax and semantics, which can be similar to
human language.

Syntax and
Description
Semantics

Different syntax,
2+3 = 3+2 bothstatementshavethesamemeaning
same semantics
Different semantics, 2/3 P ython2.7 = 2/3 P ython3
same syntax bothstatementshavethesamesyntaxbutdifferentmeanings

Natural language processing NLP is a subfield of artificial intelligence that deals with the
interaction between computers and humans in natural language. However, human
language is complex and can be challenging for computers to understand.

Perfect Syntax, No Meaning


A statement can have a perfectly correct syntax but still not make sense. For example:

"Chickens feed extravagantly while the moon drinks tea."

This statement is grammatically correct but semantically meaningless.

Challenges in Human Language


Human language has several challenges that make it difficult for computers to understand:

Homophones: Words that sound the same but have different meanings.
Homographs: Words that are spelled the same but have different meanings.
Idioms: Phrases that have a different meaning than the literal meaning of the
individual words.
Sarcasm: Language that is intended to convey a meaning that is opposite of its literal
meaning.

Text Normalisation

Page 40
Created by Turbolearn AI

Text normalisation is the process of simplifying text data to make it easier for computers to
understand. The goal of text normalisation is to reduce the complexity of the text data
while preserving its meaning.

Steps in Text Normalisation


The following are the steps involved in text normalisation:

Step Description

Sentence Segmentation Divide the text into individual sentences.


Tokenisation Divide each sentence into individual words or tokens.
Removing Stopwords, Remove common words like "the", "and", etc. that do not add
Special Characters, and much value to the meaning of the text. Also, remove special
Numbers characters and numbers if they are not relevant to the text.
Converting Text to a Convert all text to a common case e. g. lowercase to reduce case
Common Case sensitivity.
Stemming or
Reduce words to their base form e. g.′′running′′becomes′′run′′.
Lemmatization

Stemming vs Lemmatization

Stemming Lemmatization

Reduce words to their base form by removing


Reduce words to their base
Definition affixes and ensuring the resulting word is
form by removing affixes.
meaningful.
Example "running" becomes "run" "running" becomes "run"
Speed Faster Slower
Accuracy Less accurate More accurate

Bag of Words
The bag of words model is a simple NLP model that represents text data as a bag oraset of
its word occurrences without considering grammar or word order.

Steps in Bag of Words


The following are the steps involved in the bag of words model:

Page 41
Created by Turbolearn AI

1. Text Normalisation: Pre-process the text data by removing stopwords, special


characters, and numbers, and converting text to a common case.
2. Create Dictionary: Create a dictionary of unique words in the text data.
3. Create Document Vectors: Create a vector for each document in the text data, where
each element in the vector represents the frequency of a word in the dictionary.

Example of Bag of Words


Suppose we have the following text data:

Document 1: "Aman and Anil are stressed" Document 2: "Aman went to a therapist"
Document 3: "Anil went to download a health chatbot"

After text normalisation, the text data becomes:

Document 1:

′′aman′′,′′anil′′,′′are′′,′′stressed′′

Document 2:

′′aman′′,′′went′′,′′to′′,′′therapist′′

Document 3:

′′anil′′,′′went′′,′′to′′,′′download′′,′′health′′,′′chatbot′′

The dictionary of unique words is:

′′aman′′,′′anil′′,′′are′′,′′stressed′′,′′went′′,′′to′′,′′therapist′′,′′download′′,′′health′′,′′chatbot′′

The document vectors are:

Document 1:

1, 1, 1, 1, 0, 0, 0, 0, 0, 0

Document 2:

1, 0, 1, 0, 1, 1, 1, 0, 0, 0

Document 3:

1, 1, 0, 0, 1, 1, 0, 1, 1, 1

Note that the document vectors represent the frequency of each word in the dictionary for
each document.## Bag of Words Algorithm

Page 42
Created by Turbolearn AI

The Bag of Words algorithm is a method used to represent text data in a numerical format
that can be processed by machines.

Step 1: Create a Dictionary


Create a dictionary of unique words from all the documents in the corpus.

Word

aman
and
anil
are
stressed
went
download
health
chatbot
therapist
to

Step 2: Create Document Vector


Create a table with the vocabulary as the header row and the documents as rows. For each
word in the document, if it matches the vocabulary, put a 1 under it. If the same word
appears again, increment the previous value by 1. And if the word does not occur in that
document, put a 0 under it.

Page 43
Created by Turbolearn AI

Vocabulary Document 1 Document 2 Document 3

aman 1 0 0
and 1 1 1
anil 1 0 0
are 1 0 0
stressed 1 0 0
went 0 1 0
download 0 0 1
health 0 1 1
chatbot 0 1 1
therapist 0 0 1
to 0 1 0

TFIDF: Term Frequency & Inverse Document


Frequency
TFIDF is a method used to calculate the importance of each word in a document based on
its frequency and rarity across the entire corpus.

Term Frequency
Term frequency is the frequency of a word in one document.

Term frequency can be found from the document vector table.

Inverse Document Frequency


Inverse document frequency is the logarithm of the total number of documents
divided by the number of documents in which the word occurs.

Page 44
Created by Turbolearn AI

Vocabulary Document Frequency

aman 2
and 2
anil 2
are 2
stressed 1
went 1
download 1
health 2
chatbot 2
therapist 1
to 2
Vocabulary Inverse Document Frequency

aman log3/2
and log3/2
anil log3/2
are log3/2
stressed log3/1
went log3/1
download log3/1
health log3/2
chatbot log3/2
therapist log3/1
to log3/2

TFIDF Formula
TFIDFW = TFW * logIDF (W )

Where TFW is the term frequency of word W, and IDFW is the inverse document frequency
of word W.

Applications of TFIDF
TFIDF is commonly used in the Natural Language Processing domain. Some of its
applications are:

Page 45
Created by Turbolearn AI

Document Information: TFIDF helps in extracting important information from a


document.
Topic Modelling: TFIDF helps in identifying the topic of a document.
Stop word filtering: TFIDF helps in removing unnecessary words from a document.
Classification: TFIDF helps in classifying documents into different categories.
Retrieval System: TFIDF helps in retrieving relevant documents from a corpus.

DIY: Do It Yourself!
Try completing the following exercise using the corpus provided:

Document 1: We can use health chatbots for treating stress. Document 2: We can use NLP
to create chatbots and we will be making health chatbots now! Document 3: Health
Chatbots cannot replace human counsellors now. Yay >< !! @1nteLA!4Y

Accomplish the following challenges:

Create a dictionary of unique words from the corpus.


Create a document vector table for the corpus.
Calculate the TFIDF values for each word in the corpus.
Identify the important words in each document using TFIDF.## Evaluation

What is Evaluation?
Evaluation is the process of understanding the reliability of any AI model, based
on outputs by feeding test dataset into the model and comparing with actual
answers.

Model Evaluation Terminologies


The following terms are crucial in the evaluation process:

True Positive T P : When the model predicts a positive outcome and the reality is also
positive.
True Negative T N : When the model predicts a negative outcome and the reality is
also negative.
False Positive F P : When the model predicts a positive outcome but the reality is
negative.
False Negative F N : When the model predicts a negative outcome but the reality is
positive.

Page 46
Created by Turbolearn AI

Confusion Matrix
The confusion matrix is a table used to evaluate the performance of a classification model. It
maps the predictions against the actual outcomes.

Prediction Reality Count

Positive Positive TP
Positive Negative FP
Negative Positive FN
Negative Negative TN

Evaluation Methods

Accuracy
Accuracy is defined as the percentage of correct predictions out of all the
observations.

Accuracy = T P + T N / T P + T N + F P + F N

Precision
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.

Precision = TP / T P + F P

Recall
Recall is defined as the fraction of positive cases that are correctly identified.

Recall = TP / T P + F N

Comparison of Precision and Recall

Precision Recall

Numerator TP TP
Denominator TP + FP TP + FN

Page 47
Created by Turbolearn AI

Example: Forest Fire Scenario


Assume a model always predicts that there is a forest fire, regardless of the reality.

True Positives = 2 actualfiresdetected


False Positives = 98 falsealarms
Precision = 2 / 2 + 98 = 2%

In this case, the precision is low, indicating that the model is prone to false alarms.

Key Takeaways
High accuracy does not necessarily mean good performance.
Good precision does not guarantee good model performance.
Recall is an important evaluation metric that considers both true positives and false
negatives.## Choosing the Right Metric

When evaluating the performance of a model, it's essential to choose the right metric. The
choice between Precision and Recall depends on the specific use case.

Precision vs. Recall


Precision is the ratio of true positives to the sum of true positives and false positives.
Recall is the ratio of true positives to the sum of true positives and false negatives.

"Precision is about being precise, while recall is about being thorough."

High False Negative Cost


In some cases, a false negative can be costly. For example:

Forest Fire: A false negative can lead to a forest fire not being detected, resulting in
significant damage.
Viral Outbreak: A false negative can lead to a viral outbreak not being detected,
resulting in widespread infection.

High False Positive Cost


In other cases, a false positive can be costly. For example:

Page 48
Created by Turbolearn AI

Mining: A false positive can lead to unnecessary digging, resulting in wasted


resources.
Spam Detection: A false positive can lead to important emails being marked as spam,
resulting in missed information.

F1 Score
The F1 Score is a measure of the balance between precision and recall. It's calculated using
the following formula:

F1 Score = 2 * P recision ∗ Recall / P recision + Recall

"The F1 Score is a way to combine precision and recall into a single metric."

F1 Score Variations

Precision Recall F1 Score

Low Low Low


Low High Low
High Low Low
High High High

Practice Time
Let's practice calculating accuracy, precision, recall, and F1 score using the following
scenarios:

Scenario 1: Water Shortage

Predicted Water Shortage Predicted No Water Shortage

Actual Water Shortage 10 5


Actual No Water Shortage 2 8

Scenario 2: Flood Prediction

Page 49
Created by Turbolearn AI

Predicted Flood Predicted No Flood

Actual Flood 15 3
Actual No Flood 1 10

Scenario 3: Rain Prediction

Predicted Rain Predicted No Rain

Actual Rain 12 4
Actual No Rain 3 9

Scenario 4: Traffic Jam Prediction

Predicted Traffic Jam Predicted No Traffic Jam

Actual Traffic Jam 8 2


Actual No Traffic Jam 1 12

Calculate the accuracy, precision, recall, and F1 score for each scenario.

Page 50

You might also like