KEMBAR78
Data Science | PDF | Control Flow | Queue (Abstract Data Type)
0% found this document useful (0 votes)
10 views71 pages

Data Science

It's Data science notes for Computer science students.

Uploaded by

MS Mourya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views71 pages

Data Science

It's Data science notes for Computer science students.

Uploaded by

MS Mourya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

UNIT – 2 (DATA PREPROCESSING AND VISUALIZATION)

Python:
• Python is a high-level, interpreted, interactive and object-oriented scripting language.
• Python is designed to be highly readable.

Characteristics of Python:
• It supports functional and structured programming methods as well as OOP.
• It can be used as a scripting language or can be compiled to byte-code for building large
applications.
• It provides very high-level dynamic data types and supports dynamic type checking.
• It supports automatic garbage collection.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

Advantages of learning Python:


• Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it.
• Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.
• Python is Object-Oriented − Python supports Object-Oriented programming.
• Portable − Python can run on a wide variety of hardware platforms and has the same interface
on all platforms.
• Databases − Python provides interfaces to all major commercial databases.
• GUI Programming − Python supports GUI applications that can be created and ported to many
system calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.

Importance of Python:
• Python language emphasizes readability and makes coding very easy.
• Its high-level, interpreted, and object-oriented architecture makes it ideal for all types of
software solutions.
• It emphasis on syntax readability, program modularity, and code reusability which increases
the speed of development while reducing the cost of maintenance.
• It supports modules and packages which is a great way to increase productivity and save time
and effort while working.

Data Types:
Numeric:
Numeric data type represents the data which has numeric value. Numeric value can be integer,
floating number or even complex numbers.
• Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fraction or decimal). In Python there is no limit to how long an integer value
can be.
• Float – This value is represented by float class. It is a real number with floating point
representation. It is specified by a decimal point.

Sequence Type:
Sequence is the ordered collection of similar or different data types. Sequences allows to store
multiple values in an organized and efficient fashion. There are several sequence types in Python
– • String: o String is a collection of one or more characters put in a single quote or double-
quote.
o In python there is no character data type, a character is a string of length one.
o It is represented by str class.
o E.g. String1 = 'Good morning'
• List: o Lists are just like the arrays which is a ordered collection of data.
o It is very flexible as the items in a list do not need to be of the same type. o E.g. List =
["Hello", "I", "Am"]
• Tuple: o Tuple is also an ordered collection of Python objects.
o The only difference between tuple and list is that tuples are immutable i.e. tuples cannot
be modified after it is created.
o E.g. Tuple1 = ('You', 'For')

Boolean:
Data type with one of the two values, True or False.

Set:
• Set is an unordered collection of elements that is iterable, mutable and has no duplicate
elements.
• The order of elements in a set is undefined.
• E.g. set1 = set(["This", "is", "set"])

Dictionary:
• Dictionary is an unordered collection of data values. It holds key:value pair.
• Key-value is provided in the dictionary to make it more optimized. Each key-value pair in a
Dictionary is separated by a colon : whereas each key is separated by a ‘comma’.
• E.g. Dict = {‘Hello’: 'A', 2: 'B', 3: 'C’} Dict[Hello]=A

Python Variables:
• Variable is a name given to a memory location.
• Python is not “statically typed”. We do not need to declare variables before using them or
declare their type. A variable is created the moment we first assign a value to it.

Rules for creating variables in Python:


• A variable name must start with a letter or the underscore character.
• A variable name cannot start with a number.
• A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ ).
• Variable names are case-sensitive (name, Name and NAME are three different variables).
• The reserved words(keywords) cannot be used naming the variable

Python Arithmetic Operators:


Arithmetic operators are used to perform mathematical operations like addition, subtraction,
multiplication and division.
There are 7 arithmetic operators in Python:

Operator Description Syntax

+ Addition: adds two operands x+y

– Subtraction: subtracts two operands x–y

* Multiplication: multiplies two operands x*y

/ Division (float): divides the first operand by the second x/y

// Division (floor): divides the first operand by the second x // y


Modulus: returns the remainder when first operand is
% divided by the second x%y

** Power: Returns first raised to power second x ** y

Expressions:
An expression is a combination of operators and operands that is interpreted to produce some other
value. The different types of expressions in Python are:
1. Constant Expressions: These are the expressions that have constant values only. Example: x =
15 + 1.3
2. Arithmetic Expressions: An arithmetic expression is a combination of numeric values,
operators, and sometimes parenthesis. The result of this type of expression is also a numeric
value. The operators used in these expressions are arithmetic operators like addition, subtraction,
etc. Example: ((3-2)*4+89)-6
3. Integral Expressions: These are the kind of expressions that produce only integer results after all
computations and type conversions. Example:
a = 13 b =
12.0 c = a
+ int(b)
4. Floating Expressions: These are the kind of expressions which produce floating point numbers
as result after all computations and type conversions. Example:
a = 13
b=5
c=a/
b
5. Relational Expressions: In these types of expressions, arithmetic expressions are written on both
sides of relational operator (> , < , >= , <=). Those arithmetic expressions are evaluated first, and
then compared as per relational operator and produce a boolean output in the end. These
expressions are also called Boolean expressions. Example:
a = 21 b = 13
c = 40
p = (a + b)
>= c
Output:
True
6. Logical Expressions: These are kinds of expressions that result in either True or False. It
basically specifies one or more conditions.
7. Bitwise Expressions: These are the kind of expressions in which computations are performed at
bit level. Example:
a = 12 x
= a >> 2
8. Combinational Expressions: We can also use different types of expressions in a single
expression, and that will be termed as combinational expressions. Example:
a = 16
b = 12
c = a + (b >> 1)

Comments:
• Comments in Python are the lines in the code that are ignored by the compiler during the
execution of the program.
• Comments are generally used for the following purposes:
• Code Readability
• Explanation of the code or Metadata of the project
• Prevent execution of code
• To include resources

Types of Comments in Python:


There are three main kinds of comments in Python. They are:
1. Single-Line Comments:
• Python single line comment starts with the hashtag symbol (#) with no white spaces and lasts
till the end of the line.
• Python’s single-line comments are proved useful for supplying short explanations for
variables, function declarations, and expressions.
2. Multi-Line Comments:
• Python does not provide the option for multiline comments. However, there are different ways
through which we can write multiline comments:
o Using Multiple Hashtags (#): We can multiple hashtags (#) to write multiline comments
in Python. Each and every line will be considered as a single line comment.
o Using String Literals: Python ignores the string literals that are not assigned to a
variable so we can use these string literals as a comment.
3. Python Docstring:
• Python docstring is the string literals with triple quotes that are appeared right after the
function.
• It is used to associate documentation that has been written with Python modules, functions,
classes, and methods.
• It is added right below the functions, modules, or classes to describe what they do.

Type Casting:
• Type Casting is the method to convert the variable data type into a certain data type in order to
the operation required to be performed by users.
• There can be two types of Type Casting in Python – • Implicit Type Casting
• Explicit Type Casting

Implicit Type Conversion:


• In this, Python converts data type into another data type automatically.
• In this process, users don’t have to involve in this process.
Explicit Type Casting:
• In this method, Python need user involvement to convert the variable data type into certain
data type in order to the operation required.
• Explicit type casting can be done with these data type function:
• Int() : Int() function take float or string as an argument and return int type object.
• float() : float() function take int or string as an argument and return float type object.
• str() : str() function take float or int as an argument and return string type object.

Slicing:
Slicing is about obtaining a sub-string from the given string by slicing it respectively from start to
end.

Syntax:
• slice(stop), or
• slice(start, stop, step)

Parameters:
• start: Starting index where the slicing of object starts.
• stop: Ending index where the slicing of object stops.
• step: It is an optional argument that determines the increment between each index for slicing.

split():
split() method in Python split a string into a list of strings after breaking the given string by the
specified separator.

Syntax:
str.split(separator, maxsplit)

Parameters:
• separator: This is a delimiter. The string splits at this specified separator. If is not provided
then any white space is a separator.
• maxsplit: It is a number, which tells us to split the string into maximum of provided number
of times. If it i s not provided then the default is -1 that means there is no limit.
• Returns: Returns a list of strings after breaking the given string by the specified separator.

Loops in python:

While Loop:
• In python, while loop is used to execute a block of statements repeatedly until a given a
condition is satisfied.
• When the condition becomes false, the line immediately after the loop in program is executed.
• Syntax: while expression: statement(s)

for in Loop:
• For loops are used for sequential traversal. For example: traversing a list or string or array etc.
• In Python, there is no C style for loop. There is “for in” loop which is similar to “for each”
loop in other languages.
• Syntax:
for i in sequence:
statements(s)
Nested Loops:
• Python programming language allows to use one loop inside another loop.
• We can put any type of loop inside of any other type of loop. For example a for loop can be
inside a while loop or vice versa.
• Syntax of nested for in loop: for iterator_var in sequence: for iterator_var in sequence:
statements(s)
statements(s)

• Syntax of nested while loop: while expression: while expression:


statement(s)

Loop Control Statements:


• Loop control statements change execution from its normal sequence.
• When execution leaves a scope, all automatic objects that were created in that scope are
destroyed.
• Python supports the following control statements:
• Continue Statement: It returns the control to the beginning of the loop. Example:

# Prints all letters except 'e'


and 's' for letter in
'todayismonday': if letter
== 'o' or letter == 'd':
continue print ('Current
Letter :', letter)

• Break Statement: It brings control out of the loop. Example:

for letter in ' todayismonday':


# break the loop as soon it sees 'e' or
's'
if letter == 'o' or letter == 'd':
break
print 'Current Letter :', letter

• Pass Statement: We use pass statement to write empty loops. Pass is also used for empty
control statement, function and classes. Example:

# An empty loop for


letter in
'todayismonday':
pass
print 'Last Letter :', letter
FLOW CONTROL
STATEMENTS:
if Statements:
• An if statement’s clause (that is, the block following the if statement) will execute if the
statement’s condition is True. The clause is skipped if the condition is False.
• For example, let’s say you have some code that checks to see whether someone’s name is
Alice:
if name == 'Alice':
print('Hi, Alice.')
else Statements:
• An if clause can optionally be followed by an else statement.
• The else clause is executed only when the if statement’s condition is False.
• Returning to the Alice example, let’s look at some code that uses an else statement to offer a
different greeting if the person’s name isn’t Alice:
if name == 'Alice':
print('Hi, Alice.')
else: print('Hello,
stranger.')
elif Statements:
• While only one of the if or else clauses will execute, you may have a case where you want one
of many possible clauses to execute.
• The elif statement is an “else if” statement that always follows an if or another elif statement.
• It provides another condition that is checked only if all of the previous conditions were False.
• Example:
name = 'Carol'
age = 3000 if
name == 'Alice':
print('Hi,
Alice.') elif age
< 12:
print('You are not Alice,
kiddo.') elif age > 2000:
print('Unlike you, Alice is not an undead, immortal
vampire.') elif age > 100:
print('You are not Alice, grannie.')

range():
• Python range() function returns the sequence of the given number between the given range.
• range() is a built-in function of Python. It is used when a user needs to perform an action a
specific number of times.
• Syntax:
range(stop)
range(start, stop, step)
• Range() allows the user to generate a series of numbers within a given range.
• Depending on how many arguments the user is passing to the function, user can decide where
that series of numbers will begin and end as well as how big the difference will be between
one number and the next.
• range() takes mainly three arguments.
• start: integer starting from which the sequence of integers is to be returned
• stop: integer before which the sequence of integers is to be returned. The range of integers end
at stop – 1.
• step: integer value which determines the increment between each integer in the sequence

Short-circuit (lazy evaluation):


• By short-circuiting, we mean the stoppage of execution of boolean operation if the truth value
of expression has been determined already.
• The evaluation of expression takes place from left to right.

or:
• When the Python interpreter scans or expression, it takes the first statement and checks to see
if it is true.
• If the first statement is true, then Python returns that object’s value without checking the
second statement.
• The program does not bother with the second statement.
• If the first value is false, only then Python checks the second value and then the result is based
on the second half. and:
• For an and expression, Python uses a short circuit technique to check if the first statement is
false then the whole statement must be false, so it returns that value.
• Only if the first value is true, it checks the second statement and return the value.
• An expression containing and and or stops execution when the truth value of expression has
been achieved. Evaluation takes place from left to right.

Structures in Python:
• Python has four inbuilt data structures namely Lists, Dictionary, Tuple and Set:

List:
• Lists in Python are one of the most versatile collection object types available.
• Lists can be used for any type of object, from numbers and strings to more lists.
• They are accessed just like strings so they are simple to use and they’re variable length, i.e.
they grow and shrink automatically as they’re used.
• For example:
ist1 = ['physics', 'chemistry', 1997,
2000]
ist2 = [1, 2, 3, 4, 5 ]
ist3 = ["a", "b", "c", "d"]

Dictionary:
• In python, dictionary is similar to hash or maps in other languages.
• It consists of key value pairs.
• Keys are unique & immutable objects.
• The value can be accessed by unique key in the dictionary.
• Syntax: dictionary = {"key name": value} • Example:
dict = {'Name': 'Zara', 'Age': 7, 'Class':
'First'} print "dict['Name']: ",
dict['Name'] print "dict['Age']: ",
dict['Age']

Tuple:
• Python tuples work exactly like Python lists except they are immutable, i.e. they can’t be
changed in place.
• They are normally written inside parentheses to distinguish them from lists (which use square
brackets)
• Since tuples are immutable, their length is fixed.
• To grow or shrink a tuple, a new tuple must be created.
• Example:
up1 = ('physics', 'chemistry', 1997,
2000); up2 = (1, 2, 3, 4, 5 );

Set:
• The elements in the set cannot be duplicates.
• The elements in the set are immutable (cannot be modified) but the set as a whole is mutable.
• There is no index attached to any element in a python set. So, they do not support any
indexing or slicing operation.
• Example:
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat"])

Python Lists:
• Lists are just like dynamically sized arrays, declared in other languages (vector in C++ and
ArrayList in Java).
• Lists need not be homogeneous always which makes it the most powerful tool in Python.
• A single list may contain DataTypes like Integers, Strings, as well as Objects.
• Lists are mutable, and hence, they can be altered even after their creation.
• List in Python are ordered and have a definite count.
• The elements in a list are indexed according to a definite sequence and the indexing of a list is
done with 0 being the first index.
• Each element in the list has its definite place in the list, which allows duplicating of elements
in the list, with each element having its own distinct place and credibility.

List Operations:
Creating a List:
• Lists in Python can be created by just placing the sequence inside the square brackets[].
• Unlike Sets, a list doesn’t need a built-in function for the creation of a list.
• Unlike Sets, the list may contain mutable elements.

# Python program to demonstrate


# Creation of List

# Creating a List
List = []
print("Blank List: ")
print(List)
# Creating a List of numbers
List = [10, 20, 14]
print("\nList of numbers:
") print(List)

# Creating a List of strings and accessing


# using index
List = ["Geeks", "For", "Geeks"]
print("\nList Items: ")
print(List[0])
print(List[2])

# Creating a Multi-Dimensional List


# (By Nesting a list inside a
List) List = [['Geeks', 'For'],
['Geeks']] print("\nMulti-
Dimensional List: ") print(List)

Output:
Blank List:
[]

List of numbers:
[10, 20, 14]

List Items
Geeks
Geeks

Multi-Dimensional List:
[['Geeks', 'For'], ['Geeks']]

Knowing the size of List:

# Creating a List
List1 = []
print(len(List1))

# Creating a List of numbers


List2 = [10, 20, 14]
print(len(List2))

Output:
0
3

Adding Elements to a List:


• Using append() method:
Elements can be added to the List by using the built-in append() function. Only one element at a
time can be added o the list by using the append() method, for the addition of multiple elements with
the append() method, loops are used.
# Python program to demonstrate
# Addition of elements in a List

# Creating a List
List = []
print("Initial blank List: ")
print(List)

# Addition of Elements
# in the List
List.append(1)
List.append(2)
List.append(4)
print("\nList after Addition of Three elements: ")
print(List)

# Adding elements to the List


# using Iterator
for i in range(1,
4):
List.append(i)
print("\nList after Addition of elements from 1-3: ")
print(List)

Output:
Initial blank List:
[]

List after Addition of Three


elements: [1, 2, 4]

List after Addition of elements from 1-3:


[1, 2, 4, 1, 2, 3]

• Using insert() method:


append() method only works for the addition of elements at the end of the List. For the addition of
elements at the desired position, insert() method is used. Unlike append() which takes only one
argument, the insert() method requires two arguments(position, value).

# Python program to demonstrate


# Addition of elements in a List

# Creating a List
List = [1,2,3,4]
print("Initial
List: ") print(List)

# Addition of Element at
# specific Position
# (using Insert Method)
List.insert(3, 12)
List.insert(0, 'Geeks')
print("\nList after performing Insert Operation: ")
print(List)

Output:
Initial List:
[1, 2, 3, 4]

List after performing Insert Operation:


['Geeks', 1, 2, 3, 12, 4]

• Using extend() method:


This method is used to add multiple elements at the same time at the end of the list.

# Python program to demonstrate


# Addition of elements in a List
# Creating a List
List = [1, 2, 3, 4]
print("Initial List:
") print(List)

# Addition of multiple elements


# to the List at the end
# (using Extend Method)
List.extend([8, 'Geeks', 'Always'])
print("\nList after performing Extend Operation: ")
print(List)

Output:
Initial List:
[1, 2, 3, 4]

List after performing Extend Operation:


[1, 2, 3, 4, 8, 'Geeks', 'Always']

Accessing elements from the List:


In order to access the list items refer to the index number. Use the index operator [ ] to access an
item in a list. The ndex must be an integer. Nested lists are accessed using nested indexing.

# Python program to demonstrate


# accessing of element from list

# Creating a List with


# the use of multiple values
List = ["Geeks", "For", "Geeks"]

# accessing a element from the


# list using index number
print("Accessing a element from
the list") print(List[0])
print(List[2])

# Creating a Multi-Dimensional List


# (By Nesting a list inside a List)
List = [['Geeks', 'For'], ['Geeks']]

# accessing an element from the


# Multi-Dimensional List using
# index number
print("Accessing a element from a Multi-
Dimensional list") print(List[0][1])
print(List[1][0])

Output:
Accessing a element from the list
Geeks
Geeks
Accessing a element from a Multi-Dimensional list
For
Geeks

Negative indexing:
In Python, negative sequence indexes represent positions from the end of the array. Instead of having
to compute he offset as in List[len(List)-3], it is enough to just write List[-3]. Negative indexing
means beginning from the
end, -1 refers to the last item, -2 refers to the second-last item, etc.

List = [1, 2, 'Geeks', 4, 'For', 6, 'Geeks']

# accessing an element using


# negative indexing
print("Accessing element using negative indexing")

# print the last element of list


print(List[-1])

# print the third last element of list


print(List[-3])

Output:
Accessing element using negative indexing
Geeks
For

Removing Elements from the List:


• Using remove() method:
Elements can be removed from the List by using the built-in remove() function but an Error arises if
the element doesn’t exist in the list. Remove() method only removes one element at a time, to
remove a range of elements, the terator is used. The remove() method removes the specified item.
Note – Remove method in List will only remove the first occurrence of the searched element.
# Python program to demonstrate
# Removal of elements in a List

# Creating a List
List = [1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12] print("Initial
List: ") print(List)

# Removing elements from List


# using Remove() method
List.remove(5)
List.remove(6)
print("\nList after Removal of two elements: ")
print(List)

# Removing elements from List


# using iterator
method for i in
range(1, 5):
List.remove(i)
print("\nList after Removing a range of elements: ")
print(List)

Output:
Initial List:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

List after Removal of two


elements: [1, 2, 3, 4, 7, 8, 9, 10,
11, 12]

List after Removing a range of elements:


[7, 8, 9, 10, 11, 12]

• Using pop() method:


Pop() function can also be used to remove and return an element from the list, but by default it
removes only the ast element of the list, to remove an element from a specific position of the List,
the index of the element is passed
as an argument to the pop() method.

List = [1,2,3,4,5]

# Removing element from the


# Set using the pop() method
List.pop()
print("\nList after popping an element: ")
print(List)

# Removing element at a
# specific location from the
# Set using the pop() method
List.pop(2)
print("\nList after popping a specific element: ")
print(List)

Output:
List after popping an
element: [1, 2, 3, 4]

List after popping a specific element:


[1, 2, 4]

Slicing of a List
To print a specific range of elements from the list, we use the Slice operation. Slice operation is
performed on Lists with the use of a colon(:). To print elements from beginning to a range use [:
Index], to print elements from enduse [:-Index], to print elements from specific Index till the end use
[Index:], to print elements within a range, use [Start Index:End Index] and to print the whole List
with the use of slicing operation, use [:]. Further, to print the whole List in reverse order, use [::-1].

# Python program to demonstrate


# Removal of elements in a List

# Creating a List
List = ['G', 'E', 'E', 'K', 'S',
'F', 'O', 'R', 'G', 'E', 'E',
'K', 'S']
print("Initial List: ")
print(List)

# Print elements of a range


# using Slice operation
Sliced_List = List[3:8]
print("\nSlicing elements in a range 3-8: ")
print(Sliced_List)
# Print elements from a
# pre-defined point to end
Sliced_List = List[5:]
print("\nElements sliced from
5th "
"element till the end: ")
print(Sliced_List)

# Printing elements from


# beginning till end
Sliced_List = List[:]
print("\nPrinting all elements using slice operation: ")
print(Sliced_List)

Output:
Initial List:
['G', 'E', 'E', 'K', 'S', 'F', 'O', 'R', 'G', 'E', 'E', 'K', 'S']

Slicing elements in a range 3-8:


['K', 'S', 'F', 'O', 'R']

Elements sliced from 5th element till the end:


['F', 'O', 'R', 'G', 'E', 'E', 'K', 'S']

Printing all elements using slice operation:


['G', 'E', 'E', 'K', 'S', 'F', 'O', 'R', 'G', 'E', 'E', 'K', 'S']

Summary of List Methods:

Function Description

Append() Add an element to the end of the list

Extend() Add all elements of a list to another list

Insert() Insert an item at the defined index

Remove() Removes an item from the list

Pop() Removes and returns an element at the given index

Clear() Removes all items from the list


Index() Returns the index of the first matched item

Returns the count of the number of items passed as an


Count()
argument

Sort() Sort items in a list in ascending order

Reverse() Reverse the order of items in the list

copy() Returns a copy of the list

Using Lists as Stack and Queue:


Using Lists as Stack:
• The list methods make it very easy to use a list as a stack, where the last element added is the
first element retrieved (“last-in, first-out”).
• To add an item to the top of the stack, use append().
• To retrieve an item from the top of the stack, use pop() without an explicit index.
• For example:
>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]
Using Lists as Queue:
• It is also possible to use a list as a queue, where the first element added is the first element
retrieved (“first-in, first-out”); however, lists are not efficient for this purpose.
• While appends and pops from the end of list are fast, doing inserts or pops from the beginning
of a list is slow because all of the other elements have to be shifted by one.
• To implement a queue, use collections.deque which was designed to have fast appends and
pops from both ends.
• For example:
>>> from collections import deque
>>> queue = deque(["Eric", "John",
"Michael"]) >>> queue.append("Terry") #
Terry arrives
>>> queue.append("Graham") # Graham arrives
>>> queue.popleft() # The first to arrive now
leaves 'Eric'
>>> queue.popleft() # The second to arrive now leaves
'John'
>>> queue # Remaining queue in order of arrival
deque(['Michael', 'Terry', 'Graham'])

How efficient lists are when used as stack or queue:


• Stack works on the principle of “Last-in, first-out”. Queue works on the principle of “First-in,
first-out”.
• The inbuilt functions in Python make the code short and simple.
• To add an item to the top of the list when implementing stack, i.e., to push an item, we use
append() function and to pop out an element we use pop() function.
• These functions work quiet efficiently and fast in end operations.
• In case of stack, list implementation works fine and provides both append() and pop() in O(1)
time.
• When we use deque implementation, we get same time complexity.
• But when it comes to queue, the above list implementation is not efficient.
• In queue when pop() is made from the beginning of the list which is slow.
• This occurs due to the properties of list, which is fast at the end operations but slow at the
beginning operations, as all other elements have to be shifted one by one.
• So, we prefer the use of queue over list, which was specially designed to have fast appends
and pops from both the front and back end.

List Comprehensions:
• List comprehension is an elegant way to define and create a list in python.
• We can create lists just like mathematical statements and in one line only.
• A list comprehension consists of brackets containing an expression followed by a for clause,
then zero or more for or if clauses.
• The result will be a new list resulting from evaluating the expression in the context of the for
and if clauses which follow it.
• A list comprehension generally consists of these parts :
1. Output expression,
2. Input sequence,
3. A variable representing a member of the input sequence and
4. An optional predicate part.
• For example: lst = [x ** 2 for x in range (1, 11) if x % 2 == 1]

here, x ** 2 is output expression, range (1, 11) is input sequence, x is variable and if x % 2 ==
1 is predicate part.
Advantages of List Comprehension:
• More time-efficient and space-efficient than loops.
• Require fewer lines of code.
• Transforms iterative statement into a formula

Nested List Comprehensions:


• Nested List Comprehensions are nothing but a list comprehension within another list
comprehension which is quite similar to nested for loops. \
• Below is the program which implements nested loop: matrix = [] for i in range(3):

# Append an empty sublist inside the list


matrix.append([])

for j in range(5):
matrix[i].append(j)

print(matrix)
• Now by using nested list comprehensions same output can be generated in fewer lines of code:
# Nested list comprehension
matrix = [[j for j in range(5)] for i in range(3)]
print(matrix)

Tuples in Python:
• A Tuple is a collection of Python objects separated by commas.
• In someways a tuple is similar to a list in terms of indexing, nested objects and repetition but a
tuple is immutable unlike lists which are mutable.

Tuple operations:
Creation of
Tuple: # An
empty tuple
empty_tuple = ()
print
(empty_tuple)
Output:
()

# Creating non-empty tuples


# One way of
creation up =
'python', 'geeks'
print(tup)
# Another for doing the
same up = ('python',
'geeks')
print(tup)
Output
('python', 'geeks')
('python', 'geeks')

Concatenation of Tuples:
# Code for concatenating 2 tuples

uple1 = (0, 1, 2, 3)
uple2 = ('python',
'geek')

# Concatenating above
two print(tuple1 +
tuple2) Output:
(0, 1, 2, 3, 'python', 'geek')

Nesting of Tuples:
# Code for creating nested tuples

uple1 = (0, 1, 2, 3)
uple2 = ('python',
'geek') uple3 =
(tuple1, tuple2)
print(tuple3)
Output :
((0, 1, 2, 3), ('python', 'geek'))

Immutable Tuples:
• The error shows that the tuples are immutable. Error results from the attempt to change
tuple data.

#code to test that tuples are immutable

tuple1 = (0, 1, 2,
3) tuple1[0] = 4
print(tuple1)
Output
Traceback (most recent call last):
File "e0eaddff843a8695575daec34506f126.py", line 3, in
tuple1[0]=4
TypeError: 'tuple' object does not support item assignment

Slicing in Tuples:
# code to test slicing

tuple1 = (0 ,1, 2, 3)
print(tuple1[1:])
print(tuple1[::-1])
print(tuple1[2:4])

Output
(1, 2, 3)
(3, 2, 1, 0)
(2, 3)

Deleting a Tuple:

# Code for deleting a tuple

tuple3 = ( 0, 1)
del tuple3
print(tuple3)

Error:
Traceback (most recent call last):
File "d92694727db1dc9118a5250bf04dafbd.py", line 6, in <module>
print(tuple3)
NameError: name 'tuple3' is not defined
Output:
(0, 1)

Finding Length of a Tuple:

# Code for printing the length of a tuple

tuple2 = ('python', 'geek')


print(len(tuple2))

Output
2

Converting list to a Tuple:


• Takes a single parameter which may be a list,string,set or even a dictionary( only keys are
taken as elements) and converts them to a tuple.

# Code for converting a list and a string into a tuple

list1 = [0, 1, 2]
print(tuple(list1))
print(tuple('python')) # string
'python'

Output
(0, 1, 2)
('p', 'y', 't', 'h', 'o', 'n')

Difference Between List and Tuple:

SR.NO. LIST TUPLE

1 Lists are mutable Tuples are immutable

Implication of iterations is Time- The implication of iterations is


2 comparatively Faster
consuming

The list is better for performing Tuple data type is appropriate for accessing
3 operations, such as insertion and the elements
deletion.

Tuple consume less memory as compared to


4 Lists consume more memory the list

5 Lists have several built-in methods Tuple does not have many built-in methods.

The unexpected changes and errors are


6 more likely to occur In tuple, it is hard to take place.

Sets in Python:
• A Set is an unordered collection data type that is iterable, mutable and has no duplicate
elements.
• Python’s set class represents the mathematical notion of a set.
• The major advantage of using a set, as opposed to a list, is that it has a highly optimized
method for checking whether a specific element is contained in the set.
• This is based on a data structure known as a hash table. Since sets are unordered, we cannot
access items using indexes like we do in lists.

# Python program to demonstrate sets


myset = set(["a", "b",
"c"]) print(myset)
myset.add("d")
print(myset)

Set operations:
Adding elements:
Insertion in set is done through set.add() function, where an appropriate record value is created to
store in the hash able.

# A Python program to # demonstrate adding elements # in a set

# Creating a Set
people = {"Jay", "Idrish", "Archi"}

print("People:", end = " ")


print(people)

# This will add Daxit # in the set


people.add("Daxit")

# Adding elements to the # set using


iterator for i in range(1, 6):
people.add(i)

print("\nSet after adding element:", end = " ")


print(people)

Output:
People: {'Idrish', 'Archi', 'Jay'}

Set after adding element: {1, 2, 3, 4, 5, 'Idrish', 'Archi', 'Jay', 'Daxit'}

Union:
Two sets can be merged using union() function or | operator. Both Hash Table values are accessed
and traversed with merge operation perform on them to combine the elements, at the same time
duplicates are removed.

# Python Program to # demonstrate union of # two sets

people = {"Jay", "Idrish", "Archil"}


vampires = {"Karan", "Arjun"}
dracula = {"Deepanshu", "Raju"}

# Union using union()


# function
population = people.union(vampires)

print("Union using union() function")


print(population)

# Union using "|" # operator


population = people|dracula

print("\nUnion using '|' operator")


print(population)
Output:
Union using union() function
{'Karan', 'Idrish', 'Jay', 'Arjun', 'Archil'}

Union using '|' operator


{'Deepanshu', 'Idrish', 'Jay', 'Raju', 'Archil'}

Intersection:
This can be done through intersection() or & operator. Common Elements are selected. They are
similar to iteration over the Hash lists and combining the same values on both the Table.

# Python program to # demonstrate intersection # of two sets

set1 = set()
set2 = set()

for i in range(5):
set1.add(i)

for i in range(3,9):
set2.add(i)

# Intersection using # intersection()


function set3 = set1.intersection(set2)

print("Intersection using intersection() function")


print(set3)

Output:
Intersection using intersection()
function {3, 4}

Difference:
To find difference in between sets. This is done through difference() or – operator.

# Python program to # demonstrate difference # of two sets


set1 = set()
set2 = set()

for i in range(5):
set1.add(i)

for i in range(3,9):
set2.add(i)

# Difference of two sets


# using difference()
function set3 =
set1.difference(set2)
print(" Difference of two sets using difference() function")
print(set3)
# Difference of two sets # using '-'
operator set3 = set1 - set2

print("\nDifference of two sets using '-' operator")


print(set3)

Output:
Difference of two sets using difference()
function {0, 1, 2}

Difference of two sets using '-'


operator {0, 1, 2}
Clearing sets:
Clear() method empties the whole set.

# Python program to # demonstrate clearing # of set

set1 = {1,2,3,4,5,6}

print("Initial set")
print(set1)

# This method will remove # all the elements of the set set1.clear()

print("\nSet after using clear() function")


print(set1)

Output:
Initial set
{1, 2, 3, 4, 5, 6}
Set after using clear() function
set()

Set drawbacks:
There are two major drawbacks in Python sets:
1. The set doesn’t maintain elements in any particular order.
2. Only instances of immutable types can be added to a Python
set.

Operators for Sets:


Sets support the following operators:

Operators Notes

key in s containment check

key not in s non-containment check

s1 == s2 s1 is equivalent to s2

Operators Notes

s1 != s2 s1 is not equivalent to s2

s1 <= s2 s1 is subset of s2

s1 < s2 s1 is proper subset of s2

s1 >= s2 s1 is superset of s2

s1 > s2 s1 is proper superset of s2


s1 | s2 the union of s1 and s2

s1 & s2 the intersection of s1 and s2

s1 – s2 the set of elements in s1 but not s2

s1 ˆ s2 the set of elements in precisely one of s1


or s2

Dictionary:
• Dictionary in Python is an unordered collection of data values, used to store data values like a
map.
• Dictionary holds key:value pair.
• Key-value is provided in the dictionary to make it more optimized.

Operations on Dictionary:
Creating a Dictionary:
In Python, a Dictionary can be created by placing a sequence of elements within curly {} braces,
separated by
‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair
element being ts Key:value. Values in a dictionary can be of any data type and can be duplicated,
whereas keys can’t be repeated
and must be immutable.

# Creating a Dictionary
# with Integer Keys
Dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
print("\nDictionary with the use of Integer Keys: ")
print(Dict)

# Creating a Dictionary
# with Mixed keys
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("\nDictionary with the use of Mixed Keys: ")
print(Dict)
Output:
Dictionary with the use of Integer
Keys: {1: 'Geeks', 2: 'For', 3: 'Geeks'}
Dictionary with the use of Mixed
Keys: {1: [1, 2, 3, 4], 'Name':
'Geeks'}

Creation using dict():


Dictionary can also be created by the built-in function dict(). An empty dictionary can be created by
just placing to curly braces{}.

# Creating an empty
Dictionary Dict = {}
print("Empty Dictionary: ")
print(Dict)

# Creating a Dictionary
# with dict() method
Dict = dict({1: 'Geeks', 2: 'For', 3:'Geeks'})
print("\nDictionary with the use of dict(): ")
print(Dict)

# Creating a Dictionary
# with each item as a Pair
Dict = dict([(1, 'Geeks'), (2, 'For')])
print("\nDictionary with each item as a pair: ")
print(Dict)

Output:
Empty Dictionary:
{}

Dictionary with the use of


dict(): {1: 'Geeks', 2: 'For', 3:
'Geeks'}

Dictionary with each item as a


pair: {1: 'Geeks', 2: 'For'}

Adding elements to a Dictionary:


In Python Dictionary, the Addition of elements can be done in multiple ways. One value at a time
can be added to a
Dictionary by defining value along with the key e.g. Dict[Key] = ‘Value’. Updating an existing value
in a
Dictionary can be done by using the built-in update() method. Nested key values can also be added
to an existing
Dictionary. While adding a value, if the key-value already exists, the value gets updated otherwise a
new Key with the value is added to the Dictionary.

# Creating an empty Dictionary


Dict = {}
print("Empty Dictionary: ")
print(Dict)

# Adding elements one at a time


Dict[0] = 'Geeks'
Dict[2] = 'For'
Dict[3] = 1
print("\nDictionary after adding 3 elements: ")
print(Dict)

# Adding set of values


# to a single Key
Dict['Value_set'] = 2, 3, 4
print("\nDictionary after adding 3 elements: ")
print(Dict)

# Updating existing Key's Value


Dict[2] = 'Welcome'
print("\nUpdated key value: ")
print(Dict)

# Adding Nested Key value to Dictionary


Dict[5] = {'Nested' :{'1' : 'Life', '2' : 'Geeks'}}
print("\nAdding a Nested Key: ")
print(Dict)

Output:
Empty Dictionary:
{}

Dictionary after adding 3


elements: {0: 'Geeks', 2: 'For', 3:
1}

Dictionary after adding 3 elements:


{0: 'Geeks', 2: 'For', 3: 1, 'Value_set': (2, 3, 4)}

Updated key value:


{0: 'Geeks', 2: 'Welcome', 3: 1, 'Value_set': (2, 3, 4)}

Adding a Nested Key:


{0: 'Geeks', 2: 'Welcome', 3: 1, 5: {'Nested': {'1': 'Life', '2': 'Geeks'}}, 'Value_set': (2, 3, 4)}
Accessing elements from a Dictionary:
In order to access the items of a dictionary refer to its key name. Key can be used inside square
brackets.

# Python program to demonstrate


# accessing a element from a Dictionary

# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# accessing a element using key


print("Accessing a element using key:")
print(Dict['name'])

# accessing a element using key


print("Accessing a element using key:")
print(Dict[1])

Output:
Accessing a element using
key: For

Accessing a element using key:


Geeks

Accessing element using get():


There is also a method called get() that will also help in accessing the element from a dictionary.

# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# accessing a element using get()


# method
print("Accessing a element using get:")
print(Dict.get(3))

Output:
Accessing a element using get:
Geeks

Removing Elements from Dictionary:


• Using del keyword:
In Python Dictionary, deletion of keys can be done by using the del keyword. Using the del
keyword, specific values from a dictionary as well as the whole dictionary can be deleted.

# Initial Dictionary
Dict = { 5 : 'Welcome', 6 : 'To', 7 : 'Geeks',
'A' : {1 : 'Geeks', 2 : 'For', 3 : 'Geeks'},
'B' : {1 : 'Geeks', 2 :
'Life'}} print("Initial
Dictionary: ") print(Dict)

# Deleting a Key
value del Dict[6]
print("\nDeleting a specific key: ")
print(Dict)

Output:
Initial Dictionary:
{'A': {1: 'Geeks', 2: 'For', 3: 'Geeks'}, 'B': {1: 'Geeks', 2: 'Life'}, 5: 'Welcome', 6: 'To', 7: 'Geeks'}

Deleting a specific key:


{'A': {1: 'Geeks', 2: 'For', 3: 'Geeks'}, 'B': {1: 'Geeks', 2: 'Life'}, 5: 'Welcome', 7: 'Geeks'}
• Using pop() method:
Pop() method is used to return and delete the value of the key specified.

# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# Deleting a key #
using pop()
method pop_ele =
Dict.pop(1)
print('\nDictionary after deletion: ' + str(Dict))
print('Value associated to poped key is: ' +
str(pop_ele))

Output:
Dictionary after deletion: {3: 'Geeks', 'name':
'For'} Value associated to poped key is: Geeks

• Using popitem() method:


The popitem() returns and removes an arbitrary element (key, value) pair from the dictionary.

# Creating Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# Deleting an arbitrary key # using popitem()


function pop_ele = Dict.popitem()
print("\nDictionary after deletion: " +
str(Dict)) print("The arbitrary pair returned
is: " + str(pop_ele))
Output:
Dictionary after deletion: {3: 'Geeks', 'name':
'For'} The arbitrary pair returned is: (1, 'Geeks')

• Using clear() method:


All the items from a dictionary can be deleted at once by using clear() method.

# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# Deleting entire Dictionary


Dict.clear()
print("\nDeleting Entire Dictionary: ")
print(Dict)

Output:
Deleting Entire
Dictionary: {}

Dictionary Methods :
Methods Description
copy() They copy() method returns a shallow copy of the dictionary.
clear() The clear() method removes all items from the dictionary.
pop() Removes and returns an element from a dictionary having the given
key.
popitem() Removes the arbitrary key-value pair from the dictionary and
returns it as tuple.

get() It is a conventional method to access a value for a key.


dictionary_name.values() returns a list of all the values available in a given dictionary.
str() Produces a printable string representation of a dictionary.
update() Adds dictionary dict2’s key-values pairs to dict
setdefault() Set dict[key]=default if key is not already in dict
keys() Returns list of dictionary dict’s keys
items() Returns a list of dict’s (key, value) tuple pairs

has_key() Returns true if key in dictionary dict, false otherwise


fromkeys() Create a new dictionary with keys from seq and values set to value.
type() Returns the type of the passed variable.
cmp() Compares elements of both dict.

Modules:
• A Python module is a file containing Python definitions and statements.
• It can define functions, classes, and variables.
• It can also include runnable code.
• Grouping related code into a module makes the code easier to understand and use.
• It also makes the code logically organized.

Importing a Module:
We can import the functions, classes defined in a module to another module using the import
statement in some other Python source file. Syntax is as follows: import module

OS Module:
• OS module provides functions for interacting with the operating system.
• This module provides a portable way of using operating system-dependent functionality
• There are different methods available in the OS module like os.getcwd(), os.mkdir(),
os.makedirs() etc.

Example:
# importing os module
mport os

# Get the current working


cwd = os.getcwd()

# Print the current working


print("Current working directory:", cwd)

Functions of OS Module:
• os.name: This function gives the name of the operating system.
• os.error: All functions in this module raise OSError in the case of invalid or inaccessible file
names and paths, or other arguments that have the correct type, but are not accepted by the
operating system
• os.popen(): This method opens a pipe to or from command. The return value can be read or
written depending on whether the mode is ‘r’ or ‘w’.
• os.close(): Close file descriptor.
• os.rename(): A file old.txt can be renamed to new.txt, using the function os.rename(). The
name of the file changes only if, the file exists and the user has sufficient privilege permission
to change the file.

sys Module:
• sys module provides various functions and variables that are used to manipulate different parts
of the Python runtime environment.
• It allows operating on the interpreter as it provides access to the variables and functions that
interact strongly with the interpreter.

Example:
# importing sys module
mport sys

#Print system version


print(sys.version)

Variables of sys Module:


• The sys modules provide variables for better control over input or output.
• Using them, we can even redirect the input and output to other devices.
• The different variables are:
• stdin: It can be used to get input from the command line directly. It used is for standard input.
• stdout: stdout is used to display output directly to the screen console.
• stderr: Whenever an exception occurs in Python it is written to sys.stderr.

Python Functions:
• Python Functions is a block of related statements designed to perform a specific task.
• The idea is to put some commonly or repeatedly done tasks together and make a function so
that instead of writing the same code again and again for different inputs, we can do the
function calls to reuse code contained in it over and over again.
• Functions can be both built-in or user-defined. It helps the program to be concise, non-
repetitive, and organized.

Syntax:
def
function_name(parameters
): statement(s) return
expression

Defining a Function:
We can create a Python function using the def keyword. Example:

# A simple Python function

def fun():
print("Welcome to GFG")
Calling a Function:
After creating a function we can call it by using the name of the function followed by parenthes.
Example:

# A simple Python function

def fun():
print("Welcome to GFG")

# Driver code to call a function


fun()

Arguments of a Function:
• Arguments are the values passed inside the parenthesis of the function.
• A function can have any number of arguments separated by a comma.

Example:

# A simple Python function to check whether x is even or odd

def
evenOdd(x):
if (x % 2 == 0):
print("even")
else:
print("odd")

# Driver code to call the function


evenOdd(2)
evenOdd(3)

Output
even
odd

Python Lambda Functions:


• Python Lambda Functions are anonymous function means that the function is without a name.
• lambda keyword is used to define an anonymous function in Python.
• This function can have any number of arguments but only one expression, which is evaluated
and returned.
• One is free to use lambda functions wherever function objects are required.
• They are syntactically restricted to a single expression.

Syntax:
lambda arguments: expression

Example:
x = lambda a : a + 10
print(x(5))

map() function: map() function returns a map object(which is an iterator) of the results after
applying the given function to each tem of a given iterable (list, tuple etc.)
Syntax:
map(fun, iter)

Parameters:
fun : It is a function to which map passes each element of given
iterable. ter : It is a iterable which is to be mapped.

Example:

# Python program to demonstrate working of map.

# Return double
of n def
addition(n):
return n + n

# We double all numbers using map()


numbers = (1, 2, 3, 4)
result = map(addition,
numbers) print(list(result))

Output:
[2, 4, 6, 8]

Python Packages
• A Python module may contain several classes, functions, variables, etc.
• A Python package can contains several module.
• In simpler terms a package is folder that contains various modules as files.
• There are different packages in python like Beautiful soap, Numpy, Tkinter, iPython etc.

Beautiful soup
• Beautiful Soup is a Python library that makes it easy to scrape information from web pages. I
• t sits atop an HTML or XML parser and provides Pythonic idioms for iterating, searching, and
modifying the parse tree.
• It can extract all of the text from HTML tags, and alter the HTML in the document with which
we’re working.
• Some key features that make beautiful soup unique are:
o Beautiful Soup provides a few simple methods and Pythonic idioms for navigating,
searching, and modifying a parse tree.
o Beautiful Soup automatically converts incoming documents to Unicode and outgoing
documents to UTF-8.
o Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, which allows
us to try out different parsing strategies or trade speed for flexibility.
NumPy
• NumPy stands for Numerical Python.
• In Python we have lists that serve the purpose of arrays, but they are slow to process.
• NumPy is a Python library used for working with arrays.
• It aims to provide an array object that is up to 50x faster than traditional Python lists.
• It also has functions for working in domain of linear algebra, fourier transform, and matrices.
• NumPy arrays are stored at one continuous place in memory unlike lists, so processes can
access and manipulate them very efficiently.
Python:
• IPython stands for Interactive Python.
• It was originally developed as an enhanced Python interpreter.
• It offers a powerful interactive Python shell.
• It possesses object introspection ability. Introspection is the ability to check properties of an
object during runtime.
• Some features are as follows:
o Syntax highlighting o Stores the history of
interactions. o Tab completion of keywords,
variables and function names. o Ability to be
embedded in other Python programs. o Provides
access to Python debugger.

Tkinter:
• Tkinter is the standard GUI library for Python.
• Python when combined with Tkinter provides a fast and easy way to create GUI applications.
• Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.
• Creating a GUI application using Tkinter is an easy task. All you need to do is perform the
following steps – o Import the Tkinter module.
o Create the GUI application main window.
o Add one or more of the above-mentioned widgets to the GUI application. o
Enter the main event loop to take action against each event triggered by the
user.

Classes and Objects:


• A class is a user-defined blueprint or prototype from which objects are created.
• Class creates a user-defined data structure, which holds its own data members and member
functions, which can be accessed and used by creating an instance of that class.
• Classes provide a means of bundling data and functionality together.
• Classes are created by keyword class.
• Attributes are the variables that belong to a class.
• Attributes are always public and can be accessed using the dot (.) operator. Eg.:
Myclass.Myattribute

Defining a class:

Syntax:
Class classname:
statements
Example:
# Python3 program to demonstrate defining a class

class MyClass:
x=5

In the above example, the class keyword indicates that you are creating a class followed by the name
of the class.

Class Objects:
• An Object is an instance of a Class.
• A class is like a blueprint while an instance is a copy of the class with actual values.
• An object consists of:
• State: It is represented by the attributes of an object. It also reflects the properties of an object.
• Behavior: It is represented by the methods of an object. It also reflects the response of an
object to other objects.
• Identity: It gives a unique name to an object and enables one object to interact with other
objects.

Create Object:
We can use the class named MyClass to create objects:

Example: Create an object named p1, and print the value of x:

p1 = MyClass()
print(p1.x)

Class and Instance Variables:


• Instance variables are for data, unique to each instance and class variables are for attributes
and methods shared by all instances of the class.
• Instance variables are variables whose value is assigned inside a constructor or method with
self.
• Class variables are variables whose value is assigned in the class.

Defining instance variable using a constructor:

# Python3 program to show that the variables with a value assigned in the class declaration, are
class variables and variables inside methods and constructors are instance variables.

# Class for Dog


class Dog:

# Class Variable
animal = 'dog'
# The init method or
constructor def __init__(self,
breed, color):

# Instance Variable
self.breed = breed
self.color = color

# Objects of Dog class


Rodger = Dog("Pug", "brown")
Buzo = Dog("Bulldog", "black")

print('Rodger details:')
print('Rodger is a',
Rodger.animal) print('Breed: ',
Rodger.breed)
print('Color: ', Rodger.color)

print('\nBuzo details:')
print('Buzo is a', Buzo.animal)
print('Breed: ', Buzo.breed)
print('Color: ', Buzo.color)

# Class variables can be accessed using class name also


print("\nAccessing class variable using class name")
print(Dog.animal)

Output:
Rodger details:
Rodger is a dog
Breed: Pug
Color: brown

Buzo details:
Buzo is a dog
Breed: Bulldog
Color: black

Python Inheritance:
• Inheritance enables us to define a class that takes all the functionality from a parent class and
allows us to add
more.
• Inheritance is a powerful feature in object-oriented programming.
• It refers to defining a new class with little or no modification to an existing class.
• The new class is called derived (or child) class and the one from which it inherits is called
the base (or parent) class.
• Derived class inherits features from the base class where new features can be added to it. This
results in reusability of code.

Syntax:

class BaseClass:
Body of base class

class DerivedClass(BaseClass):
Body of derived class

Example :

# A Python program to demonstrate inheritance

class Person(object):

# Constructor def
__init__(self, name):
self.name = name

# To get name
def getName(self):
return self.name
# To check if this
person is an
employee def
isEmployee(self):
return False

# Inherited or Subclass (Note Person in


bracket) class Employee(Person):

# Here we return
true def
isEmployee(self):
return True

# Driver code emp = Person("Geek1")


# An Object of Person
print(emp.getName(),
emp.isEmployee())

emp = Employee("Geek2") # An Object of Employee


print(emp.getName(), emp.isEmployee())
Multiple Inheritance:
When a class is derived from more than one base class it is called multiple Inheritance. The derived
class inherits all the features of the base case.

Syntax:

Class Base1:
Body of the class

Class Base2:
Body of the class

Class Derived(Base1, Base2):


Body of the class

# Python Program to depict multiple inheritance when method is overridden in both classes

class Class1:
def m(self):
print("In Class1")

class
Class2(Class1):
def m(self):
print("In Class2")

class
Class3(Class1):
def m(self):
print("In Class3")

class Class4(Class2,
Class3): pass

obj = Class4()
obj.m()

Output:
In Class2
class Cat: def
__init__(self, name, age):
self.name = name
self.age = age

def info(self): print(f"I am a cat. My name is {self.name}.


I am {self.age} years old.")

def make_sound(self):
print("Meow")

class Dog: def


__init__(self, name, age):
self.name = name
self.age = age

def info(self): print(f"I am a dog. My name is {self.name}.


I am {self.age} years old.")

def make_sound(self):
print("Bark")

Polymorphism:

We can use the concept of polymorphism while creating class methods as Python allows
different classes to
have methods with the same
name.
Polymorphism means the same function name (but different signatures) being used for
different types. •

Example:
cat1 = Cat("Kitty", 2.5)
dog1 = Dog("Fluffy", 4)

for animal in (cat1,


dog1):
animal.make_sound()
animal.info()
animal.make_sound()

Output:

Meow
I am a cat. My name is Kitty. I am 2.5 years old.
Meow
Bark
I am a dog. My name is Fluffy. I am 4 years old.
Bark

• Here, we have created two classes Cat and Dog .


• They share a similar structure and have the same method names info() and make_sound() .
• However, notice that we have not created a common superclass or linked the classes together
in any way.
• Even then, we can pack these two different objects into a tuple and iterate through it
using a common animal variable. It is possible due to polymorphism.

Method Overloading:
• Python does not support method overloading by default.
• The problem with method overloading in Python is that we may overload the methods but can
only use the latest defined method.

# First product method.


# Takes two argument and print their
product def product(a, b):
p=a*b
print(p)

# Second product method


# Takes three argument and print their
# product def
product(a, b, c):
p = a * b*c
print(p)

# Uncommenting the below line shows an error


# product(4, 5)

# This line will call the second product method


product(4, 5, 5)
Output: 100

• In the above code, we have defined two product method, but we can only use the second
product method, as python does not support method overloading.
• We may define many methods of the same name and different arguments, but we can only use
the latest defined method.
• Calling the other method will produce an error.
Method Overriding:
• Method overriding is an ability of any object-oriented programming language that allows a
subclass or child class to provide a specific implementation of a method that is already
provided by one of its super-classes or parent classes.
• When a method in a subclass has the same name, same parameters or signature and same
return type(or subtype) as a method in its super-class, then the method in the subclass is said
to override the method in the super-class.
• The version of a method that is executed will be determined by the object that is used to
invoke it.
• If an object of a parent class is used to invoke the method, then the version in the parent class
will be executed, but if an object of the subclass is used to invoke the method, then the version
in the child class will be executed.

# Python program to demonstrate method overriding : -

# Defining parent class


class Parent():

# Constructor
def __init__(self):
self.value = "Inside Parent"

# Parent's show
method def
show(self):
print(self.value)

# Defining child class


class Child(Parent):

# Constructor def
__init__(self):
self.value = "Inside Child"

# Child's show
method def
show(self):
print(self.value)

# Driver's code
obj1 = Parent()
obj2 = Child()

obj1.show()
obj2.show()
Output:
Inside Parent
Inside Child

Data Hiding:
• Data hiding is a concept which underlines the hiding of data or information from the user.
• In class, if we declare the data members as private so that no other class can access the data
members, then it is a process of hiding data.
• Thus, data hiding imparts security, along with discarding dependency.
• Data hiding in Python is performed using the __ double underscore. This makes the class
members nonpublic and isolated from the other classes.

Example:
class
Solution:
__privateCounter = 0

def sum(self):
self.__privateCounter += 1
print(self.__privateCounter)

count = Solution()
count.sum()
count.sum()

# Here it will show error because it unable to access private member


print(count.__privateCount)

Advantages of Data Hiding:


1. It helps to prevent damage or misuse of volatile data by hiding it from the public.
2. The class objects are disconnected from the irrelevant data.
3. It isolates objects as the basic concept of OOP.
4. It increases the security against hackers that are unable to access important data.

Disadvantages of Data Hiding:


1. It enables programmers to write lengthy code to hide important data from common clients.
2. The linkage between the visible and invisible data makes the objects work faster, but data
hiding prevents this linkage.
UNIT – 3 (MACHINE LEARNING BASICS)

Machine Learning Basics:


• Humans learn from there past experiences.
• Machines follow instructions given by humans.
• What if, humans can train machines to learn from their past experiences so that machines can
act faster.
• This approach which gives computers the capability to learn without being explicitly
programmed is known as machine learning.
• Machine learning enables a machine to automatically learn from data, improve performance
from experiences,
and predict things without being explicitly
programmed.
Features of Machine Learning:
• It is a data-driven technology.
• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• Machine learning is much similar to data mining as it also deals with the huge amount of the
data.

Need for Machine Learning:


• The need for machine learning is increasing day by day.
• The reason behind the need for machine learning is that it is capable of doing tasks that are too
complex for a person to implement directly.
• As a human, we have some limitations as we cannot access the huge amount of data manually,
so for this, we need some computer systems and here comes the machine learning to make
things easy for us.
• We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically.
• With the help of machine learning, we can save both time and money.
• Example:
Suppose a person Paul listen to a song and then likes or dislikes it based on its certain features
like intensity, tempo etc.
We can group together the songs which Paul likes and dislikes.
Let say, Paul likes a song which has high intensity and tempo. Other songs he dislikes.
Now suppose a new song A comes in. Now we have to identify whether Paul will like this
song or dislike this song.

On the basis of songs position in the graph we can say that Paul likes the song.

Now if multiple songs are there and it becomes very difficult for a human to identify the group
in which the song will belong. In such a case, machine learning algorithm comes in picture. A
ML algorithm can easily identify each new song with the help of historical data.

Related areas of Machine Learning:


• Artificial Learning (AI): o Artificial Intelligence refers to the stimulation of human
intelligence in machines. o The functions of the human brain are studied and replicated on
a machine or a system so that it can mimic human behaviour.
o Artificial Intelligence is rule-based and static, and it uses logic, if-else rules.
o It is applied to solve complex problems and automate routine work.
• Simulation environments: o Generating training data for AI systems is often challenging.
o Developing digital environments that simulate the behaviour of real world will provide
us with test beds to measure and train an AI’s general intelligence.
o These environments present raw data to an AI, which then take actions in order to solve
for the goals they have been set. o Training in these simulation environments can help
us understand how AI systems learn, how to improve them.
• Networks with memory:
o In order for AI systems to generalise in diverse real-world environments, they must be
able to continually learn new tasks and remember how to perform all of them into the
future.
o Traditional neural networks are typically incapable of task learning without forgetting.
o Neural networks have been modified and upgraded with varying degrees of memory
so that they can learn new tasks.

• Generative models: o Generative models learns a probability distribution over training


examples. o By learning from this, generative models output new examples that are
similar to the training data.

Applications of Machine learning:


• Image Recognition: o Image recognition is used to identify objects, persons, places,
digital images, etc. The popular use case of image recognition and face detection is,
Automatic friend tagging suggestion.
• Speech Recognition: o Speech recognition is a process of identifying human voices.
o Google assistant, Siri, Cortana, and Alexa are using speech recognition technology.
• Traffic prediction: o If we want to visit a new place, we take help of Google Maps, which
shows us the correct path with the shortest route and predicts the traffic conditions.
o It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested.
• Product recommendations:
o Machine learning is used by various e-commerce and entertainment companies for
product recommendation to the user.
o Whenever we search for some product, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine
learning.
• Self-driving cars: o Machine learning plays a significant role in self-driving cars.
o Tesla is using unsupervised learning method to train the car models to detect people and
objects while driving.

Machine Learning Software Tools:


1. Scikit-learn: Scikit-learn is used for machine learning development in python.
Features:
• It helps in data mining and data analysis.
• It provides models and algorithms for Classification, Regression, Clustering,
Dimensional reduction, Model selection, and Pre-processing.

Pros:
• Easily understandable documentation is provided.
• Parameters for any specific algorithm can be changed while calling objects.

2. PyTorch: PyTorch is a Torch based Python machine learning library which is a computing
framework, scripting language, and machine learning library.
Features:
• It helps in building neural networks.
• It can be used on cloud platforms.
• It provides distributed training, various tools, and libraries.
Pros:
• It helps in creating computational graphs.
• Easy to use.

3. TensorFlow: TensorFlow provides a JavaScript library that helps in machine learning.


Features:
• Helps in training and building your models.
• Existing models can run with the help of TensorFlow.js which is a model converter.
• It helps in the neural network.

Pros:

• You can use it in two ways, i.e., by script tags or by installing through NPM.
Cons:
• It is difficult to learn.

4. Weka: These machine learning algorithms help in data mining.


Features:
• Data preparation
• Classification
• Regression
• Clustering
• Visualization
• Association rules mining. Pros:
• Provides online courses for training.
• Easy to understand algorithms.
Cons:
• Not much documentation and online support are available.
5. KNIME: KNIME is a tool for data analytics, reporting and integration platform. Using the
data pipelining concept, it combines different components for machine learning and data
mining.
Features:
• It can integrate the code of programming languages like C, C++, R, Python, Java, JavaScript
etc.
• It can be used for business intelligence, financial data analysis, and CRM.
Pros:
• It is easy to deploy and install.
• Easy to learn.
Cons:
• Difficult to build complicated models.
• Limited visualization and exporting capabilities.

Types of Machine Learning:

Supervised ML:
• Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output.
• The labelled data means some input data is already tagged with the correct output.
• In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering.
• Some popular supervised learning algorithms are linear regression, logistic regression etc.

How Supervised Learning Works?


• In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data.
• Once the training process is completed, the model is tested on the basis of test data (a subset of
the training set), and then it predicts the output.
• The working of Supervised learning can be easily understood by the following example:
• Suppose we have a dataset of different types of shapes which includes square, triangle and
hexagon.
• Now the first step is that we need to train the model for each shape: o If the given shape has
four sides, and all the sides are equal, then it will be labelled as a Square. o If the given shape
has three sides, then it will be labelled as a triangle. o If the given shape has six equal sides
then it will be labelled as hexagon.
• Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
• The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.

Advantages of Supervised learning:


• With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
• In supervised learning, we can have an exact idea about the classes of objects.
• It helps us to solve various real-world problems such as spam filtering etc.

Disadvantages of supervised learning:


• Not suitable for handling the complex tasks.
• Cannot predict the correct output if the test data is different from the training dataset.
• Training requires lots of computation times.

Unsupervised ML:
• Unsupervised learning is a type of machine learning in which models are trained using
unlabelled dataset and are allowed to act on that data without any supervision.
• The model itself find the hidden patterns and insights from the given data.
• The goal of unsupervised learning is to find the pattern of dataset, group that data according to
similarities, and represent that dataset in a suitable format.
• Some popular unsupervised learning algorithms are: K-means clustering, KNN (k-nearest
neighbors) etc.

Why use Unsupervised Learning?


• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
• Unsupervised learning works on unlabelled and uncategorized data which make unsupervised
learning more important.
• In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.

Working of Unsupervised Learning


Working of unsupervised learning can be understood by the below example:

• Here, we have taken an unlabelled input data. This data includes names of different cricketers,
their scores and wickets.
• Now, this unlabelled input data is fed to the machine learning model.
• It will interpret the raw data to find the hidden patterns from the data. For this, a graph can be
plotted by the machine learning algorithm to divide the data into groups of Batsman and
Bowler according to the similarities.

Advantages of Unsupervised Learning:


• Unsupervised learning is used for more complex tasks because there is no labelled input data.
• Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to
labelled data.

Disadvantages of Unsupervised Learning:


• Unsupervised learning is intrinsically more difficult than supervised learning as it does not
have corresponding output.
• Result might be less accurate as input data is not labelled, and algorithms do not know the
exact output in advance.

Difference between Supervised and Unsupervised Learning:

Supervised Learning Unsupervised Learning


Supervised learning algorithms are trained Unsupervised learning algorithms are trained
using labelled data. using unlabelled data.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
In supervised learning, input data is
In unsupervised learning, only input data is
provided to the model along with the output
provided to the model.
during training.
The goal of supervised learning is to train The goal of unsupervised learning is to find
the model so that it can predict the output the hidden patterns and useful insights from
when it is given new data. the unknown dataset.
Supervised learning needs supervision to Unsupervised learning does not need any
train the model. supervision to train the model.

Unsupervised learning model may give less


Supervised learning model produces an
accurate result as compared to supervised
accurate result.
learning.

Reinforcement Learning:
• Reinforcement Learning is a feedback-based Machine learning technique.
• In it, an agent learns to behave in an environment by performing the actions and seeing the
results of actions.
• For each good action, the agent gets positive feedback, and for each bad action, the agent gets
negative feedback.
• There is no labelled data, so the agent is bound to learn by its experience only.
• Example:

An image is feed in to the system and it is asked about it. The ML algorithm tells that this image
is of a CAT which is not true. Hence, a feedback is given by the user that this image is of DOG
not CAT and the machine gets trained about it. Next time whenever DOG’s image is shown it
will tell that it is DOG’s image.
Features of Reinforcement Learning:
• In RL, the agent is not instructed about the environment and what actions need to be taken.
• The agent takes an action and changes states according to the feedback of the action.

Applications:
• Robotics:
RL is used in Robot navigation, Robo-soccer, walking etc.
• Control:
RL can be used for adaptive control such as Factory processes, admission control in
telecommunication, etc.
• Game Playing:
RL can be used in Game playing such as tic-tac-toe, chess, etc.
• Chemistry:
RL can be used for optimizing the chemical reactions.
• Business:
RL is used for business strategy planning.

Types of supervised Machine learning Algorithms:

Regression:
• Regression means predicting a value after analysing sample inputs.
• Example: Predicting the cost of a dish in a restaurant’s menu after analysing the menus of
different restaurants.
• Regression algorithms are used if there is a relationship between the input variable and the
output variable.
• It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends.
• Some popular Regression algorithms are: Linear Regression, Regression Trees, Non-Linear
Regression etc.

Classification:
• Classification means allocating a new item to one of the classes.
• Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
• Example: Assigning a new song to one of the Paul’s groups: Like and Dislike, based on the
features of song.
• Some classification algorithms are: Random Forest, Decision Trees, etc.

Difference between Regression and Classification:

Regression Classification

Output variable must be of continuous Output variable must be a discrete value.


nature or real value.

Used with continuous data. Used with discrete data.

In Regression, we try to find the best fit In Classification, we try to find the decision
line, which can predict the output more boundary, which can divide the dataset into
accurately. different classes.

Used to solve the regression problems such


Used to solve classification problems such
as Weather Prediction, House price
as Identification of spam emails etc.
prediction, etc.

Divided into Linear and Non-linear Divided into Binary Classifier and Multi-
Regression. class Classifier.
Parametric vs Non Parametric regression:
• Machine learning models can be parametric or non-parametric.
• Parametric models are those that require the specification of some parameters before they can
be used.
• Non-parametric models do not rely on any specific parameter settings and therefore often
produce more accurate results.
• In case of parametric models, the assumption related to the functional form is made and linear
model is considered.
• In case of non-parametric models, the assumption about the functional form is not made.
• Parametric models are much easier to fit than non-parametric models because parametric
machine learning models only require the estimation of a set of parameters as the model is
identified prior as linear model.
• In case of non-parametric model, one needs to estimate some arbitrary function which is a
much difficult task.
• One can go for parametric models when the goal is to find result.
• One can go for non-parametric models when the goal is to make prediction.

Types of Regression:
• Linear Regression:
o It made up of linear variables. o It uses a regression line, also known as a best-fit line. o
It models the relationship between a single input independent variable and an output
dependent variable using a regression line.
o It is of two types: Simple: One dependent and one independent variable. Complex: One
dependent and multiple independent variables
o It is mostly used when the relationship to be modelled is not extremely complex and if you
don’t have a lot of data.
o Example: Let the relationship is defined as Y = c+m*X where ‘c’ denotes the intercept and
‘m’ denotes the slope of the line.

• Polynomial Regression:
o It is used for non-linear data and can model complex relationships. o In this regression
technique, the best fit line is a curve that fits into the data points. o Polynomial regression
analysis represents a non-linear relationship between dependent and independent variables.
o It requires careful design. Need some knowledge of the data in order to select the best
exponents.

• Logistic Regression:
o It is used when the dependent variable is discrete i.e., has only one of two values. Example:
0 or 1. o A sigmoid curve denotes the relation between the dependent variable and
independent variable. o It works best with large data sets that have an almost equal
occurrence of values in target variables.
o The dataset should not contain a high correlation between independent variables as this
will create a problem when ranking the variables.
• Locally weighted Regression:
o It is a supervised learning non-parametric algorithm.
o The model does not learn a fixed set of parameters. Rather parameters are computed
individually for each query. o While computing parameters, a higher preference is given to
the points in the training set.

Numerical optimization:
• For many problems it is hard to figure out the best solution directly. In such cases, Numerical
Optimization techniques are used.
• Using these optimization approaches, it is relatively easy to set up a function that measures
how good a solution is and then minimize the parameters of that function to find the solution.
• One such technique for Numerical optimization is Gradient Descent.

Gradient Descent:
• Gradient Descent is a numerical optimization algorithm.
• The main objective of using a gradient descent algorithm is to minimize the cost function
using iteration.
• It is also used for updating the parameters of the learning model.
• It is an iterative optimization algorithm which is used to train the various machine learning
and deep learning models.
• It helps in finding the local minimum of a function.
• The best way to define the local minimum or local maximum of a function using gradient
descent is as follows: o If we move towards a negative gradient or away from the gradient of
the function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the function at
the current point, we will get the local maximum of that function.

Cost-function:
Cost function is defined as the measurement of difference or error between actual values and
expected values at the current position and present in the form of a single real number.

Types of Gradient Descent:


There are three types of gradient descent:
1. Batch Gradient Descent:
a. It is used to find the error for each point in the training set and update the model after
evaluating all training examples.
b. It produces less noise in comparison to other gradient descent and produces stable
convergence.
c. It is computationally efficient as all resources are used for all training samples.
2. Stochastic gradient descent:
a. It has a training period for each example within a dataset and updates each training
example's parameters one at a time.
b. It requires only one training example at a time, hence it is easier to store in allocated
memory.
c. It shows some computational efficiency losses in comparison to batch gradient systems
as it shows frequent updates that require more detail and speed.
d. Due to frequent updates, it is also treated as a noisy gradient.
e. It is more efficient for large datasets.
3. MiniBatch Gradient Descent:
a. It is the combination of both batch gradient descent and stochastic gradient descent.
b. It divides the training datasets into small batch sizes then performs the updates on those
batches separately.
c. Splitting training datasets into smaller batches make a balance to maintain the
computational efficiency of batch gradient descent and speed of stochastic gradient
descent.
d. It is easier to fit in allocated memory.
e. It produces stable gradient descent convergence.

Challenges with the Gradient Descent:


1. Local Minima and Saddle Point:
For convex problems, gradient descent can find the global minimum easily, while for non-convex
problems, it is sometimes difficult to find the global minimum.
2. Vanishing and Exploding Gradient:
Vanishing Gradient occurs when the gradient is smaller than expected. Exploding gradient occurs
when the Gradient is too large.

Kernel methods:
• Kernel methods are sets of different types of algorithms that are being used for pattern
analysis.
• They are used to solve a non-linear problem by using a linear classifier.
• Kernels Methods are used in SVM (Support Vector Machines) which is used in
classification and regression problems.

Need for Kernel Method:


• Support Vector Machines are supervised machine learning algorithms.
• It is very difficult to solve a classification problem using a linear classifier because
there is no good linear line that will be able to classify the points which are randomly
distributed.
• Here comes the use of kernel function which takes the points to higher dimensions,
solves the problem over there and returns the output.

Types of Kernel methods:


1. Liner Kernel:
Linear kernel is defined by the dot product of two vectors: K(x1, x2) = x1 . x2
2. Polynomial Kernel:
A polynomial kernel is defined by the following equation: K(x1, x2) = (x1 . x2 + 1)^d where,
d is the degree of the polynomial and x1 and x2 are vectors.
3. Laplacian Kernel:
This type of kernel is less prone for changes.
4. Hyperbolic or the Sigmoid Kernel:
This kernel is used in neural network areas of machine learning.
5. Anova radial basis kernel:
This kernel is very useful in multidimensional regression problems.

Decision Tree:
• Decision tree is the most powerful and popular tool for classification and prediction.
• A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label.
• Example:

Advantages of the Decision Tree:


o It is simple to understand. o It can be very useful for solving decision-
related problems. o It helps to think about all the possible outcomes
for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree:


o The decision tree contains lots of layers, which makes it complex. o
For more class labels, the computational complexity of the decision tree
may increase.

Decision Tree Algorithm:


• The decision tree Algorithm is a supervised algorithm.
• The goal of this algorithm is to create a model that predicts the value of a target variable, for
which the decision tree uses the tree representation to solve the problem in which the leaf node
corresponds to a class label and attributes are represented on the internal node of the tree.
• The steps of the algorithm are as follows:
Step-1: Begin the tree with the root node which contains the complete dataset.
Step-2: Find the best attribute in the dataset.
Step-3: Divide the root node into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -
3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.

Example: Decision tree for


patients:

Let’s take a sample data set:

Suppose we have a sample of 14 patient data set and we have to predict which drug to suggest to the
patient A or B. Let’s say we pick cholesterol as the first attribute to split data. Then the tree is:
It will split our data into two branches High and Normal based on cholesterol. Let’s suppose our new
patient has high cholesterol by the above split of our data we cannot say whether Drug B or Drug A
will be suitable for the patient. Also, If the patient cholesterol is normal we still do not have an idea
or information to determine that either Drug A or Drug B is Suitable for the patient. Let us take
Another Attribute Age, as we can see age has three categories in it Young, middle age and senior
let’s try to split.

From the above figure, now we can say that we can easily predict which Drug to give to a patient
based on his or her reports.

Issues in decision tree:


• Decision trees are less appropriate for estimation tasks where the goal is to predict the value of
a continuous attribute.
• Decision trees are prone to errors in classification problems with many classes and relatively
small number of training examples.
• Decision tree can be computationally expensive to train.
• The process of growing a decision tree is computationally expensive. At each node, each
candidate splitting field must be sorted before its best split can be found. In some algorithms,
combinations of fields are used and a search must be made for optimal combining weights.

Inductive Bias:
• Every machine learning model requires some type of architecture design and possibly some
initial assumptions about the data we want to analyze.
• Generally, every building block and every belief that we make about the data is a form of
inductive bias.
• Inductive bias are assumptions that are made by the learning algorithm to form a hypothesis or
a generalization beyond the set of training instances in order to classify unobserved data.
• Occam’s razor is one of the simplest examples of inductive bias

Evaluating hypotheses:
• Hypothesis in Machine Learning is used when in a Supervised Machine Learning, we need to
find the function that best maps input to output.
• Whenever you form a hypothesis for a given training data set, you have to test or evaluate how
accurate the considered hypothesis is. This is known as hypotheses evaluation.

Estimating Hypotheses Accuracy:


• The skill or prediction error of a model must be estimated, and as an estimate, it will contain
error.
• This is made clear by distinguishing between the true error of a model and the estimated or
sample error. o Sample Error. Estimate of true error calculated on a data sample.
o True Error. Probability that a model will misclassify a randomly selected example from the
domain.
• We want to know the true error, but we must work with the estimate, approximated from a
data sample.
• This raises the question of how good is a given estimate of error?
• One approach is to calculate a confidence interval around the sample error that is large enough
to cover the true error with a very high likelihood, such as 95%.

Basics of Sampling Theory:


• The equation to calculate the confidence interval makes many assumptions.
• Proportional values like classification accuracy and classification error fit a Binomial
distribution.
• The Binomial distribution characterizes the probability of a binary event, such as a coin flip or
a correct/incorrect classification prediction.
• The mean is the expected value in the distribution, the variance is the average distance that
samples have from the mean, and the standard deviation is the variance normalized by the size
of the data sample.
• Ideally, we seek an unbiased estimate of our desired parameter that has the smallest variance.
• Confidence intervals provide a way to quantify the uncertainty in a population parameter, such
as a mean.
• The Binomial distribution can be approximated with the simpler Gaussian distribution for
large sample sizes.
• The interval can be centered on the mean called two-sided, but can also be one-sided, such as
a radius to the left or right of the mean.

Comparing Learning Algorithms:


• Comparing algorithms involves training machine learning algorithms and evaluating them
potentially on multiple different samples of data from the domain.
• The comparison of two algorithms is motivated by estimating the expected or mean difference
between the two methods.
• A procedure is presented that uses k-fold cross-validation where each algorithm is trained and
evaluated on the same splits of the data.
• A final mean difference in error is calculated, from which a confidence interval can be
estimated.
• The calculation of the confidence interval is updated to account for the reduced number of
degrees of freedom as each algorithm is evaluated on the same test set.

Bayes’ Theorem
• Bayes Theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to that event.
• It is used where the probability of occurrence of a particular event is calculated based on other
conditions which are also called conditional probability.
• Example: There are 3 bags, each containing some white marbles and some black marbles in
each bag. If a white marble is drawn at random. With probability to find that this white marble
is from the first bag. In cases like such, we use the Bayes’ Theorem.

Bayes’ Theorem Statement:


f E1, E2 ,..., En are n non empty events which constitute a partition of sample space S, i.e. E1, E2
,..., En are pairwise disjoint and E1_ E2_ ... _ En = S and A is any event of nonzero probability, then

Problem 1: Bag I contains 3 red and 4 black balls while another Bag II contains 5 red and 6
black balls. One ball is drawn at random from one of the bags and it is found to be red. Find
the probability that it was drawn from Bag II.
Problem 2: In a factory which manufactures bolts, machines A, B and C manufacture
respectively 25%, 35% and 40% of the bolts. Of their outputs, 5, 4 and 2 percent are
respectively defective bolts. A bolt is drawn at random from the product and is found to be
defective. What is the probability that it is manufactured by the machine B?

Concept Learning:
• Concept learning is a task of acquiring potential hypothesis (solution) that best fits the given
training examples.
• Consider the example task of learning the target concept “days on which my friend Prabhas
enjoys his favorite water sport.”
• Below Table describes a set of example days, each represented by a set of attributes.
• The attribute EnjoySport indicates whether or not Prabhas enjoys his favorite water sport on
this day.
• The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the
values of its other attributes.
Bayes Optimal Classifier:
• Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for
a new example.
• It is described using the Bayes Theorem.
• It is closely related to the Maximum a Posteriori: a probabilistic framework referred to as
MAP that finds the most probable hypothesis for a training dataset.
• In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable to
calculate, and instead, simplifications such as the Gibbs algorithm and Naive Bayes can be
used to approximate the outcome.
Naive Bayes Classifiers:
• Naïve Bayes algorithm is a supervised learning algorithm
• It is based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.

Working of Naïve Bayes' Classifier:


Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using
this dataset we need to decide that whether we should play or not on a particular day according to the
weather conditions. So to solve his problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Types of Naïve Bayes Model:


There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This
means if predictors take continuous values instead of discrete, then the model assumes that
these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word is
present or not in a document. This model is also famous for document classification tasks.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets. o It
can be used for Binary as well as Multi-class Classifications. o It performs well in Multi-
class predictions as compared to the other Algorithms.

Disadvantages of Naïve Bayes Classifier:


Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring. o It is used in medical data classification. o It can be used in
real-time predictions because Naïve Bayes Classifier is an eager learner. o It is used in Text
classification such as Spam filtering and Sentiment analysis.
Bayesian Belief Networks:
• A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph
• Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
• Bayesian network is based on Joint probability distribution and conditional probability.
• Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
o Directed Acyclic Graph o
Table of conditional
probabilities
• A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node
corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph. These
links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other.

Expectation-Maximization Algorithm:
• In the real-world applications of machine learning, it is very common that there are many
relevant features available for learning but only a small subset of them are observable.
• So, for the variables which are sometimes observable and sometimes not, then we can use the
instances when that variable is visible is observed for the purpose of learning and then predict
its value in the instances when it is not observable.
• Expectation-Maximization algorithm can be used for the latent variables (variables that are
not directly observable and are actually inferred from the values of the other observed
variables) too in order to predict their values with the condition that the general form of
probability distribution governing those latent variables is known to us.
• This algorithm is actually at the base of many unsupervised clustering algorithms in the field
of machine learning.

Algorithm:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.
Usage of EM algorithm:
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden Markov Model
(HMM).
• It can be used for discovering the values of latent variables.

Advantages of EM algorithm:
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms of
implementation.
• Solutions to the M-steps often exist in the closed form.

Disadvantages of EM algorithm:
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical optimization requires
only forward probability).

You might also like