Data Science
Data Science
Python:
• Python is a high-level, interpreted, interactive and object-oriented scripting language.
• Python is designed to be highly readable.
Characteristics of Python:
• It supports functional and structured programming methods as well as OOP.
• It can be used as a scripting language or can be compiled to byte-code for building large
applications.
• It provides very high-level dynamic data types and supports dynamic type checking.
• It supports automatic garbage collection.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
Importance of Python:
• Python language emphasizes readability and makes coding very easy.
• Its high-level, interpreted, and object-oriented architecture makes it ideal for all types of
software solutions.
• It emphasis on syntax readability, program modularity, and code reusability which increases
the speed of development while reducing the cost of maintenance.
• It supports modules and packages which is a great way to increase productivity and save time
and effort while working.
Data Types:
Numeric:
Numeric data type represents the data which has numeric value. Numeric value can be integer,
floating number or even complex numbers.
• Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fraction or decimal). In Python there is no limit to how long an integer value
can be.
• Float – This value is represented by float class. It is a real number with floating point
representation. It is specified by a decimal point.
Sequence Type:
Sequence is the ordered collection of similar or different data types. Sequences allows to store
multiple values in an organized and efficient fashion. There are several sequence types in Python
– • String: o String is a collection of one or more characters put in a single quote or double-
quote.
o In python there is no character data type, a character is a string of length one.
o It is represented by str class.
o E.g. String1 = 'Good morning'
• List: o Lists are just like the arrays which is a ordered collection of data.
o It is very flexible as the items in a list do not need to be of the same type. o E.g. List =
["Hello", "I", "Am"]
• Tuple: o Tuple is also an ordered collection of Python objects.
o The only difference between tuple and list is that tuples are immutable i.e. tuples cannot
be modified after it is created.
o E.g. Tuple1 = ('You', 'For')
Boolean:
Data type with one of the two values, True or False.
Set:
• Set is an unordered collection of elements that is iterable, mutable and has no duplicate
elements.
• The order of elements in a set is undefined.
• E.g. set1 = set(["This", "is", "set"])
Dictionary:
• Dictionary is an unordered collection of data values. It holds key:value pair.
• Key-value is provided in the dictionary to make it more optimized. Each key-value pair in a
Dictionary is separated by a colon : whereas each key is separated by a ‘comma’.
• E.g. Dict = {‘Hello’: 'A', 2: 'B', 3: 'C’} Dict[Hello]=A
Python Variables:
• Variable is a name given to a memory location.
• Python is not “statically typed”. We do not need to declare variables before using them or
declare their type. A variable is created the moment we first assign a value to it.
Expressions:
An expression is a combination of operators and operands that is interpreted to produce some other
value. The different types of expressions in Python are:
1. Constant Expressions: These are the expressions that have constant values only. Example: x =
15 + 1.3
2. Arithmetic Expressions: An arithmetic expression is a combination of numeric values,
operators, and sometimes parenthesis. The result of this type of expression is also a numeric
value. The operators used in these expressions are arithmetic operators like addition, subtraction,
etc. Example: ((3-2)*4+89)-6
3. Integral Expressions: These are the kind of expressions that produce only integer results after all
computations and type conversions. Example:
a = 13 b =
12.0 c = a
+ int(b)
4. Floating Expressions: These are the kind of expressions which produce floating point numbers
as result after all computations and type conversions. Example:
a = 13
b=5
c=a/
b
5. Relational Expressions: In these types of expressions, arithmetic expressions are written on both
sides of relational operator (> , < , >= , <=). Those arithmetic expressions are evaluated first, and
then compared as per relational operator and produce a boolean output in the end. These
expressions are also called Boolean expressions. Example:
a = 21 b = 13
c = 40
p = (a + b)
>= c
Output:
True
6. Logical Expressions: These are kinds of expressions that result in either True or False. It
basically specifies one or more conditions.
7. Bitwise Expressions: These are the kind of expressions in which computations are performed at
bit level. Example:
a = 12 x
= a >> 2
8. Combinational Expressions: We can also use different types of expressions in a single
expression, and that will be termed as combinational expressions. Example:
a = 16
b = 12
c = a + (b >> 1)
Comments:
• Comments in Python are the lines in the code that are ignored by the compiler during the
execution of the program.
• Comments are generally used for the following purposes:
• Code Readability
• Explanation of the code or Metadata of the project
• Prevent execution of code
• To include resources
Type Casting:
• Type Casting is the method to convert the variable data type into a certain data type in order to
the operation required to be performed by users.
• There can be two types of Type Casting in Python – • Implicit Type Casting
• Explicit Type Casting
Slicing:
Slicing is about obtaining a sub-string from the given string by slicing it respectively from start to
end.
Syntax:
• slice(stop), or
• slice(start, stop, step)
Parameters:
• start: Starting index where the slicing of object starts.
• stop: Ending index where the slicing of object stops.
• step: It is an optional argument that determines the increment between each index for slicing.
split():
split() method in Python split a string into a list of strings after breaking the given string by the
specified separator.
Syntax:
str.split(separator, maxsplit)
Parameters:
• separator: This is a delimiter. The string splits at this specified separator. If is not provided
then any white space is a separator.
• maxsplit: It is a number, which tells us to split the string into maximum of provided number
of times. If it i s not provided then the default is -1 that means there is no limit.
• Returns: Returns a list of strings after breaking the given string by the specified separator.
Loops in python:
While Loop:
• In python, while loop is used to execute a block of statements repeatedly until a given a
condition is satisfied.
• When the condition becomes false, the line immediately after the loop in program is executed.
• Syntax: while expression: statement(s)
for in Loop:
• For loops are used for sequential traversal. For example: traversing a list or string or array etc.
• In Python, there is no C style for loop. There is “for in” loop which is similar to “for each”
loop in other languages.
• Syntax:
for i in sequence:
statements(s)
Nested Loops:
• Python programming language allows to use one loop inside another loop.
• We can put any type of loop inside of any other type of loop. For example a for loop can be
inside a while loop or vice versa.
• Syntax of nested for in loop: for iterator_var in sequence: for iterator_var in sequence:
statements(s)
statements(s)
• Pass Statement: We use pass statement to write empty loops. Pass is also used for empty
control statement, function and classes. Example:
range():
• Python range() function returns the sequence of the given number between the given range.
• range() is a built-in function of Python. It is used when a user needs to perform an action a
specific number of times.
• Syntax:
range(stop)
range(start, stop, step)
• Range() allows the user to generate a series of numbers within a given range.
• Depending on how many arguments the user is passing to the function, user can decide where
that series of numbers will begin and end as well as how big the difference will be between
one number and the next.
• range() takes mainly three arguments.
• start: integer starting from which the sequence of integers is to be returned
• stop: integer before which the sequence of integers is to be returned. The range of integers end
at stop – 1.
• step: integer value which determines the increment between each integer in the sequence
or:
• When the Python interpreter scans or expression, it takes the first statement and checks to see
if it is true.
• If the first statement is true, then Python returns that object’s value without checking the
second statement.
• The program does not bother with the second statement.
• If the first value is false, only then Python checks the second value and then the result is based
on the second half. and:
• For an and expression, Python uses a short circuit technique to check if the first statement is
false then the whole statement must be false, so it returns that value.
• Only if the first value is true, it checks the second statement and return the value.
• An expression containing and and or stops execution when the truth value of expression has
been achieved. Evaluation takes place from left to right.
Structures in Python:
• Python has four inbuilt data structures namely Lists, Dictionary, Tuple and Set:
List:
• Lists in Python are one of the most versatile collection object types available.
• Lists can be used for any type of object, from numbers and strings to more lists.
• They are accessed just like strings so they are simple to use and they’re variable length, i.e.
they grow and shrink automatically as they’re used.
• For example:
ist1 = ['physics', 'chemistry', 1997,
2000]
ist2 = [1, 2, 3, 4, 5 ]
ist3 = ["a", "b", "c", "d"]
Dictionary:
• In python, dictionary is similar to hash or maps in other languages.
• It consists of key value pairs.
• Keys are unique & immutable objects.
• The value can be accessed by unique key in the dictionary.
• Syntax: dictionary = {"key name": value} • Example:
dict = {'Name': 'Zara', 'Age': 7, 'Class':
'First'} print "dict['Name']: ",
dict['Name'] print "dict['Age']: ",
dict['Age']
Tuple:
• Python tuples work exactly like Python lists except they are immutable, i.e. they can’t be
changed in place.
• They are normally written inside parentheses to distinguish them from lists (which use square
brackets)
• Since tuples are immutable, their length is fixed.
• To grow or shrink a tuple, a new tuple must be created.
• Example:
up1 = ('physics', 'chemistry', 1997,
2000); up2 = (1, 2, 3, 4, 5 );
Set:
• The elements in the set cannot be duplicates.
• The elements in the set are immutable (cannot be modified) but the set as a whole is mutable.
• There is no index attached to any element in a python set. So, they do not support any
indexing or slicing operation.
• Example:
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat"])
Python Lists:
• Lists are just like dynamically sized arrays, declared in other languages (vector in C++ and
ArrayList in Java).
• Lists need not be homogeneous always which makes it the most powerful tool in Python.
• A single list may contain DataTypes like Integers, Strings, as well as Objects.
• Lists are mutable, and hence, they can be altered even after their creation.
• List in Python are ordered and have a definite count.
• The elements in a list are indexed according to a definite sequence and the indexing of a list is
done with 0 being the first index.
• Each element in the list has its definite place in the list, which allows duplicating of elements
in the list, with each element having its own distinct place and credibility.
List Operations:
Creating a List:
• Lists in Python can be created by just placing the sequence inside the square brackets[].
• Unlike Sets, a list doesn’t need a built-in function for the creation of a list.
• Unlike Sets, the list may contain mutable elements.
# Creating a List
List = []
print("Blank List: ")
print(List)
# Creating a List of numbers
List = [10, 20, 14]
print("\nList of numbers:
") print(List)
Output:
Blank List:
[]
List of numbers:
[10, 20, 14]
List Items
Geeks
Geeks
Multi-Dimensional List:
[['Geeks', 'For'], ['Geeks']]
# Creating a List
List1 = []
print(len(List1))
Output:
0
3
# Creating a List
List = []
print("Initial blank List: ")
print(List)
# Addition of Elements
# in the List
List.append(1)
List.append(2)
List.append(4)
print("\nList after Addition of Three elements: ")
print(List)
Output:
Initial blank List:
[]
# Creating a List
List = [1,2,3,4]
print("Initial
List: ") print(List)
# Addition of Element at
# specific Position
# (using Insert Method)
List.insert(3, 12)
List.insert(0, 'Geeks')
print("\nList after performing Insert Operation: ")
print(List)
Output:
Initial List:
[1, 2, 3, 4]
Output:
Initial List:
[1, 2, 3, 4]
Output:
Accessing a element from the list
Geeks
Geeks
Accessing a element from a Multi-Dimensional list
For
Geeks
Negative indexing:
In Python, negative sequence indexes represent positions from the end of the array. Instead of having
to compute he offset as in List[len(List)-3], it is enough to just write List[-3]. Negative indexing
means beginning from the
end, -1 refers to the last item, -2 refers to the second-last item, etc.
Output:
Accessing element using negative indexing
Geeks
For
# Creating a List
List = [1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12] print("Initial
List: ") print(List)
Output:
Initial List:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
List = [1,2,3,4,5]
# Removing element at a
# specific location from the
# Set using the pop() method
List.pop(2)
print("\nList after popping a specific element: ")
print(List)
Output:
List after popping an
element: [1, 2, 3, 4]
Slicing of a List
To print a specific range of elements from the list, we use the Slice operation. Slice operation is
performed on Lists with the use of a colon(:). To print elements from beginning to a range use [:
Index], to print elements from enduse [:-Index], to print elements from specific Index till the end use
[Index:], to print elements within a range, use [Start Index:End Index] and to print the whole List
with the use of slicing operation, use [:]. Further, to print the whole List in reverse order, use [::-1].
# Creating a List
List = ['G', 'E', 'E', 'K', 'S',
'F', 'O', 'R', 'G', 'E', 'E',
'K', 'S']
print("Initial List: ")
print(List)
Output:
Initial List:
['G', 'E', 'E', 'K', 'S', 'F', 'O', 'R', 'G', 'E', 'E', 'K', 'S']
Function Description
List Comprehensions:
• List comprehension is an elegant way to define and create a list in python.
• We can create lists just like mathematical statements and in one line only.
• A list comprehension consists of brackets containing an expression followed by a for clause,
then zero or more for or if clauses.
• The result will be a new list resulting from evaluating the expression in the context of the for
and if clauses which follow it.
• A list comprehension generally consists of these parts :
1. Output expression,
2. Input sequence,
3. A variable representing a member of the input sequence and
4. An optional predicate part.
• For example: lst = [x ** 2 for x in range (1, 11) if x % 2 == 1]
here, x ** 2 is output expression, range (1, 11) is input sequence, x is variable and if x % 2 ==
1 is predicate part.
Advantages of List Comprehension:
• More time-efficient and space-efficient than loops.
• Require fewer lines of code.
• Transforms iterative statement into a formula
for j in range(5):
matrix[i].append(j)
print(matrix)
• Now by using nested list comprehensions same output can be generated in fewer lines of code:
# Nested list comprehension
matrix = [[j for j in range(5)] for i in range(3)]
print(matrix)
Tuples in Python:
• A Tuple is a collection of Python objects separated by commas.
• In someways a tuple is similar to a list in terms of indexing, nested objects and repetition but a
tuple is immutable unlike lists which are mutable.
Tuple operations:
Creation of
Tuple: # An
empty tuple
empty_tuple = ()
print
(empty_tuple)
Output:
()
Concatenation of Tuples:
# Code for concatenating 2 tuples
uple1 = (0, 1, 2, 3)
uple2 = ('python',
'geek')
# Concatenating above
two print(tuple1 +
tuple2) Output:
(0, 1, 2, 3, 'python', 'geek')
Nesting of Tuples:
# Code for creating nested tuples
uple1 = (0, 1, 2, 3)
uple2 = ('python',
'geek') uple3 =
(tuple1, tuple2)
print(tuple3)
Output :
((0, 1, 2, 3), ('python', 'geek'))
Immutable Tuples:
• The error shows that the tuples are immutable. Error results from the attempt to change
tuple data.
tuple1 = (0, 1, 2,
3) tuple1[0] = 4
print(tuple1)
Output
Traceback (most recent call last):
File "e0eaddff843a8695575daec34506f126.py", line 3, in
tuple1[0]=4
TypeError: 'tuple' object does not support item assignment
Slicing in Tuples:
# code to test slicing
tuple1 = (0 ,1, 2, 3)
print(tuple1[1:])
print(tuple1[::-1])
print(tuple1[2:4])
Output
(1, 2, 3)
(3, 2, 1, 0)
(2, 3)
Deleting a Tuple:
tuple3 = ( 0, 1)
del tuple3
print(tuple3)
Error:
Traceback (most recent call last):
File "d92694727db1dc9118a5250bf04dafbd.py", line 6, in <module>
print(tuple3)
NameError: name 'tuple3' is not defined
Output:
(0, 1)
Output
2
list1 = [0, 1, 2]
print(tuple(list1))
print(tuple('python')) # string
'python'
Output
(0, 1, 2)
('p', 'y', 't', 'h', 'o', 'n')
The list is better for performing Tuple data type is appropriate for accessing
3 operations, such as insertion and the elements
deletion.
5 Lists have several built-in methods Tuple does not have many built-in methods.
Sets in Python:
• A Set is an unordered collection data type that is iterable, mutable and has no duplicate
elements.
• Python’s set class represents the mathematical notion of a set.
• The major advantage of using a set, as opposed to a list, is that it has a highly optimized
method for checking whether a specific element is contained in the set.
• This is based on a data structure known as a hash table. Since sets are unordered, we cannot
access items using indexes like we do in lists.
Set operations:
Adding elements:
Insertion in set is done through set.add() function, where an appropriate record value is created to
store in the hash able.
# Creating a Set
people = {"Jay", "Idrish", "Archi"}
Output:
People: {'Idrish', 'Archi', 'Jay'}
Union:
Two sets can be merged using union() function or | operator. Both Hash Table values are accessed
and traversed with merge operation perform on them to combine the elements, at the same time
duplicates are removed.
Intersection:
This can be done through intersection() or & operator. Common Elements are selected. They are
similar to iteration over the Hash lists and combining the same values on both the Table.
set1 = set()
set2 = set()
for i in range(5):
set1.add(i)
for i in range(3,9):
set2.add(i)
Output:
Intersection using intersection()
function {3, 4}
Difference:
To find difference in between sets. This is done through difference() or – operator.
for i in range(5):
set1.add(i)
for i in range(3,9):
set2.add(i)
Output:
Difference of two sets using difference()
function {0, 1, 2}
set1 = {1,2,3,4,5,6}
print("Initial set")
print(set1)
# This method will remove # all the elements of the set set1.clear()
Output:
Initial set
{1, 2, 3, 4, 5, 6}
Set after using clear() function
set()
Set drawbacks:
There are two major drawbacks in Python sets:
1. The set doesn’t maintain elements in any particular order.
2. Only instances of immutable types can be added to a Python
set.
Operators Notes
s1 == s2 s1 is equivalent to s2
Operators Notes
s1 != s2 s1 is not equivalent to s2
s1 <= s2 s1 is subset of s2
s1 >= s2 s1 is superset of s2
Dictionary:
• Dictionary in Python is an unordered collection of data values, used to store data values like a
map.
• Dictionary holds key:value pair.
• Key-value is provided in the dictionary to make it more optimized.
Operations on Dictionary:
Creating a Dictionary:
In Python, a Dictionary can be created by placing a sequence of elements within curly {} braces,
separated by
‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair
element being ts Key:value. Values in a dictionary can be of any data type and can be duplicated,
whereas keys can’t be repeated
and must be immutable.
# Creating a Dictionary
# with Integer Keys
Dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
print("\nDictionary with the use of Integer Keys: ")
print(Dict)
# Creating a Dictionary
# with Mixed keys
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("\nDictionary with the use of Mixed Keys: ")
print(Dict)
Output:
Dictionary with the use of Integer
Keys: {1: 'Geeks', 2: 'For', 3: 'Geeks'}
Dictionary with the use of Mixed
Keys: {1: [1, 2, 3, 4], 'Name':
'Geeks'}
# Creating an empty
Dictionary Dict = {}
print("Empty Dictionary: ")
print(Dict)
# Creating a Dictionary
# with dict() method
Dict = dict({1: 'Geeks', 2: 'For', 3:'Geeks'})
print("\nDictionary with the use of dict(): ")
print(Dict)
# Creating a Dictionary
# with each item as a Pair
Dict = dict([(1, 'Geeks'), (2, 'For')])
print("\nDictionary with each item as a pair: ")
print(Dict)
Output:
Empty Dictionary:
{}
Output:
Empty Dictionary:
{}
# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
Output:
Accessing a element using
key: For
# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
Output:
Accessing a element using get:
Geeks
# Initial Dictionary
Dict = { 5 : 'Welcome', 6 : 'To', 7 : 'Geeks',
'A' : {1 : 'Geeks', 2 : 'For', 3 : 'Geeks'},
'B' : {1 : 'Geeks', 2 :
'Life'}} print("Initial
Dictionary: ") print(Dict)
# Deleting a Key
value del Dict[6]
print("\nDeleting a specific key: ")
print(Dict)
Output:
Initial Dictionary:
{'A': {1: 'Geeks', 2: 'For', 3: 'Geeks'}, 'B': {1: 'Geeks', 2: 'Life'}, 5: 'Welcome', 6: 'To', 7: 'Geeks'}
# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
# Deleting a key #
using pop()
method pop_ele =
Dict.pop(1)
print('\nDictionary after deletion: ' + str(Dict))
print('Value associated to poped key is: ' +
str(pop_ele))
Output:
Dictionary after deletion: {3: 'Geeks', 'name':
'For'} Value associated to poped key is: Geeks
# Creating Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
Output:
Deleting Entire
Dictionary: {}
Dictionary Methods :
Methods Description
copy() They copy() method returns a shallow copy of the dictionary.
clear() The clear() method removes all items from the dictionary.
pop() Removes and returns an element from a dictionary having the given
key.
popitem() Removes the arbitrary key-value pair from the dictionary and
returns it as tuple.
Modules:
• A Python module is a file containing Python definitions and statements.
• It can define functions, classes, and variables.
• It can also include runnable code.
• Grouping related code into a module makes the code easier to understand and use.
• It also makes the code logically organized.
Importing a Module:
We can import the functions, classes defined in a module to another module using the import
statement in some other Python source file. Syntax is as follows: import module
OS Module:
• OS module provides functions for interacting with the operating system.
• This module provides a portable way of using operating system-dependent functionality
• There are different methods available in the OS module like os.getcwd(), os.mkdir(),
os.makedirs() etc.
Example:
# importing os module
mport os
Functions of OS Module:
• os.name: This function gives the name of the operating system.
• os.error: All functions in this module raise OSError in the case of invalid or inaccessible file
names and paths, or other arguments that have the correct type, but are not accepted by the
operating system
• os.popen(): This method opens a pipe to or from command. The return value can be read or
written depending on whether the mode is ‘r’ or ‘w’.
• os.close(): Close file descriptor.
• os.rename(): A file old.txt can be renamed to new.txt, using the function os.rename(). The
name of the file changes only if, the file exists and the user has sufficient privilege permission
to change the file.
sys Module:
• sys module provides various functions and variables that are used to manipulate different parts
of the Python runtime environment.
• It allows operating on the interpreter as it provides access to the variables and functions that
interact strongly with the interpreter.
Example:
# importing sys module
mport sys
Python Functions:
• Python Functions is a block of related statements designed to perform a specific task.
• The idea is to put some commonly or repeatedly done tasks together and make a function so
that instead of writing the same code again and again for different inputs, we can do the
function calls to reuse code contained in it over and over again.
• Functions can be both built-in or user-defined. It helps the program to be concise, non-
repetitive, and organized.
Syntax:
def
function_name(parameters
): statement(s) return
expression
Defining a Function:
We can create a Python function using the def keyword. Example:
def fun():
print("Welcome to GFG")
Calling a Function:
After creating a function we can call it by using the name of the function followed by parenthes.
Example:
def fun():
print("Welcome to GFG")
Arguments of a Function:
• Arguments are the values passed inside the parenthesis of the function.
• A function can have any number of arguments separated by a comma.
Example:
def
evenOdd(x):
if (x % 2 == 0):
print("even")
else:
print("odd")
Output
even
odd
Syntax:
lambda arguments: expression
Example:
x = lambda a : a + 10
print(x(5))
map() function: map() function returns a map object(which is an iterator) of the results after
applying the given function to each tem of a given iterable (list, tuple etc.)
Syntax:
map(fun, iter)
Parameters:
fun : It is a function to which map passes each element of given
iterable. ter : It is a iterable which is to be mapped.
Example:
# Return double
of n def
addition(n):
return n + n
Output:
[2, 4, 6, 8]
Python Packages
• A Python module may contain several classes, functions, variables, etc.
• A Python package can contains several module.
• In simpler terms a package is folder that contains various modules as files.
• There are different packages in python like Beautiful soap, Numpy, Tkinter, iPython etc.
Beautiful soup
• Beautiful Soup is a Python library that makes it easy to scrape information from web pages. I
• t sits atop an HTML or XML parser and provides Pythonic idioms for iterating, searching, and
modifying the parse tree.
• It can extract all of the text from HTML tags, and alter the HTML in the document with which
we’re working.
• Some key features that make beautiful soup unique are:
o Beautiful Soup provides a few simple methods and Pythonic idioms for navigating,
searching, and modifying a parse tree.
o Beautiful Soup automatically converts incoming documents to Unicode and outgoing
documents to UTF-8.
o Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, which allows
us to try out different parsing strategies or trade speed for flexibility.
NumPy
• NumPy stands for Numerical Python.
• In Python we have lists that serve the purpose of arrays, but they are slow to process.
• NumPy is a Python library used for working with arrays.
• It aims to provide an array object that is up to 50x faster than traditional Python lists.
• It also has functions for working in domain of linear algebra, fourier transform, and matrices.
• NumPy arrays are stored at one continuous place in memory unlike lists, so processes can
access and manipulate them very efficiently.
Python:
• IPython stands for Interactive Python.
• It was originally developed as an enhanced Python interpreter.
• It offers a powerful interactive Python shell.
• It possesses object introspection ability. Introspection is the ability to check properties of an
object during runtime.
• Some features are as follows:
o Syntax highlighting o Stores the history of
interactions. o Tab completion of keywords,
variables and function names. o Ability to be
embedded in other Python programs. o Provides
access to Python debugger.
Tkinter:
• Tkinter is the standard GUI library for Python.
• Python when combined with Tkinter provides a fast and easy way to create GUI applications.
• Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.
• Creating a GUI application using Tkinter is an easy task. All you need to do is perform the
following steps – o Import the Tkinter module.
o Create the GUI application main window.
o Add one or more of the above-mentioned widgets to the GUI application. o
Enter the main event loop to take action against each event triggered by the
user.
Defining a class:
Syntax:
Class classname:
statements
Example:
# Python3 program to demonstrate defining a class
class MyClass:
x=5
In the above example, the class keyword indicates that you are creating a class followed by the name
of the class.
Class Objects:
• An Object is an instance of a Class.
• A class is like a blueprint while an instance is a copy of the class with actual values.
• An object consists of:
• State: It is represented by the attributes of an object. It also reflects the properties of an object.
• Behavior: It is represented by the methods of an object. It also reflects the response of an
object to other objects.
• Identity: It gives a unique name to an object and enables one object to interact with other
objects.
Create Object:
We can use the class named MyClass to create objects:
p1 = MyClass()
print(p1.x)
# Python3 program to show that the variables with a value assigned in the class declaration, are
class variables and variables inside methods and constructors are instance variables.
# Class Variable
animal = 'dog'
# The init method or
constructor def __init__(self,
breed, color):
# Instance Variable
self.breed = breed
self.color = color
print('Rodger details:')
print('Rodger is a',
Rodger.animal) print('Breed: ',
Rodger.breed)
print('Color: ', Rodger.color)
print('\nBuzo details:')
print('Buzo is a', Buzo.animal)
print('Breed: ', Buzo.breed)
print('Color: ', Buzo.color)
Output:
Rodger details:
Rodger is a dog
Breed: Pug
Color: brown
Buzo details:
Buzo is a dog
Breed: Bulldog
Color: black
Python Inheritance:
• Inheritance enables us to define a class that takes all the functionality from a parent class and
allows us to add
more.
• Inheritance is a powerful feature in object-oriented programming.
• It refers to defining a new class with little or no modification to an existing class.
• The new class is called derived (or child) class and the one from which it inherits is called
the base (or parent) class.
• Derived class inherits features from the base class where new features can be added to it. This
results in reusability of code.
Syntax:
class BaseClass:
Body of base class
class DerivedClass(BaseClass):
Body of derived class
Example :
class Person(object):
# Constructor def
__init__(self, name):
self.name = name
# To get name
def getName(self):
return self.name
# To check if this
person is an
employee def
isEmployee(self):
return False
# Here we return
true def
isEmployee(self):
return True
Syntax:
Class Base1:
Body of the class
Class Base2:
Body of the class
# Python Program to depict multiple inheritance when method is overridden in both classes
class Class1:
def m(self):
print("In Class1")
class
Class2(Class1):
def m(self):
print("In Class2")
class
Class3(Class1):
def m(self):
print("In Class3")
class Class4(Class2,
Class3): pass
obj = Class4()
obj.m()
Output:
In Class2
class Cat: def
__init__(self, name, age):
self.name = name
self.age = age
def make_sound(self):
print("Meow")
def make_sound(self):
print("Bark")
Polymorphism:
•
We can use the concept of polymorphism while creating class methods as Python allows
different classes to
have methods with the same
name.
Polymorphism means the same function name (but different signatures) being used for
different types. •
Example:
cat1 = Cat("Kitty", 2.5)
dog1 = Dog("Fluffy", 4)
Output:
Meow
I am a cat. My name is Kitty. I am 2.5 years old.
Meow
Bark
I am a dog. My name is Fluffy. I am 4 years old.
Bark
Method Overloading:
• Python does not support method overloading by default.
• The problem with method overloading in Python is that we may overload the methods but can
only use the latest defined method.
• In the above code, we have defined two product method, but we can only use the second
product method, as python does not support method overloading.
• We may define many methods of the same name and different arguments, but we can only use
the latest defined method.
• Calling the other method will produce an error.
Method Overriding:
• Method overriding is an ability of any object-oriented programming language that allows a
subclass or child class to provide a specific implementation of a method that is already
provided by one of its super-classes or parent classes.
• When a method in a subclass has the same name, same parameters or signature and same
return type(or subtype) as a method in its super-class, then the method in the subclass is said
to override the method in the super-class.
• The version of a method that is executed will be determined by the object that is used to
invoke it.
• If an object of a parent class is used to invoke the method, then the version in the parent class
will be executed, but if an object of the subclass is used to invoke the method, then the version
in the child class will be executed.
# Constructor
def __init__(self):
self.value = "Inside Parent"
# Parent's show
method def
show(self):
print(self.value)
# Constructor def
__init__(self):
self.value = "Inside Child"
# Child's show
method def
show(self):
print(self.value)
# Driver's code
obj1 = Parent()
obj2 = Child()
obj1.show()
obj2.show()
Output:
Inside Parent
Inside Child
Data Hiding:
• Data hiding is a concept which underlines the hiding of data or information from the user.
• In class, if we declare the data members as private so that no other class can access the data
members, then it is a process of hiding data.
• Thus, data hiding imparts security, along with discarding dependency.
• Data hiding in Python is performed using the __ double underscore. This makes the class
members nonpublic and isolated from the other classes.
Example:
class
Solution:
__privateCounter = 0
def sum(self):
self.__privateCounter += 1
print(self.__privateCounter)
count = Solution()
count.sum()
count.sum()
On the basis of songs position in the graph we can say that Paul likes the song.
Now if multiple songs are there and it becomes very difficult for a human to identify the group
in which the song will belong. In such a case, machine learning algorithm comes in picture. A
ML algorithm can easily identify each new song with the help of historical data.
Pros:
• Easily understandable documentation is provided.
• Parameters for any specific algorithm can be changed while calling objects.
2. PyTorch: PyTorch is a Torch based Python machine learning library which is a computing
framework, scripting language, and machine learning library.
Features:
• It helps in building neural networks.
• It can be used on cloud platforms.
• It provides distributed training, various tools, and libraries.
Pros:
• It helps in creating computational graphs.
• Easy to use.
Pros:
• You can use it in two ways, i.e., by script tags or by installing through NPM.
Cons:
• It is difficult to learn.
Supervised ML:
• Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output.
• The labelled data means some input data is already tagged with the correct output.
• In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering.
• Some popular supervised learning algorithms are linear regression, logistic regression etc.
Unsupervised ML:
• Unsupervised learning is a type of machine learning in which models are trained using
unlabelled dataset and are allowed to act on that data without any supervision.
• The model itself find the hidden patterns and insights from the given data.
• The goal of unsupervised learning is to find the pattern of dataset, group that data according to
similarities, and represent that dataset in a suitable format.
• Some popular unsupervised learning algorithms are: K-means clustering, KNN (k-nearest
neighbors) etc.
• Here, we have taken an unlabelled input data. This data includes names of different cricketers,
their scores and wickets.
• Now, this unlabelled input data is fed to the machine learning model.
• It will interpret the raw data to find the hidden patterns from the data. For this, a graph can be
plotted by the machine learning algorithm to divide the data into groups of Batsman and
Bowler according to the similarities.
Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
In supervised learning, input data is
In unsupervised learning, only input data is
provided to the model along with the output
provided to the model.
during training.
The goal of supervised learning is to train The goal of unsupervised learning is to find
the model so that it can predict the output the hidden patterns and useful insights from
when it is given new data. the unknown dataset.
Supervised learning needs supervision to Unsupervised learning does not need any
train the model. supervision to train the model.
Reinforcement Learning:
• Reinforcement Learning is a feedback-based Machine learning technique.
• In it, an agent learns to behave in an environment by performing the actions and seeing the
results of actions.
• For each good action, the agent gets positive feedback, and for each bad action, the agent gets
negative feedback.
• There is no labelled data, so the agent is bound to learn by its experience only.
• Example:
An image is feed in to the system and it is asked about it. The ML algorithm tells that this image
is of a CAT which is not true. Hence, a feedback is given by the user that this image is of DOG
not CAT and the machine gets trained about it. Next time whenever DOG’s image is shown it
will tell that it is DOG’s image.
Features of Reinforcement Learning:
• In RL, the agent is not instructed about the environment and what actions need to be taken.
• The agent takes an action and changes states according to the feedback of the action.
Applications:
• Robotics:
RL is used in Robot navigation, Robo-soccer, walking etc.
• Control:
RL can be used for adaptive control such as Factory processes, admission control in
telecommunication, etc.
• Game Playing:
RL can be used in Game playing such as tic-tac-toe, chess, etc.
• Chemistry:
RL can be used for optimizing the chemical reactions.
• Business:
RL is used for business strategy planning.
Regression:
• Regression means predicting a value after analysing sample inputs.
• Example: Predicting the cost of a dish in a restaurant’s menu after analysing the menus of
different restaurants.
• Regression algorithms are used if there is a relationship between the input variable and the
output variable.
• It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends.
• Some popular Regression algorithms are: Linear Regression, Regression Trees, Non-Linear
Regression etc.
Classification:
• Classification means allocating a new item to one of the classes.
• Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
• Example: Assigning a new song to one of the Paul’s groups: Like and Dislike, based on the
features of song.
• Some classification algorithms are: Random Forest, Decision Trees, etc.
Regression Classification
In Regression, we try to find the best fit In Classification, we try to find the decision
line, which can predict the output more boundary, which can divide the dataset into
accurately. different classes.
Divided into Linear and Non-linear Divided into Binary Classifier and Multi-
Regression. class Classifier.
Parametric vs Non Parametric regression:
• Machine learning models can be parametric or non-parametric.
• Parametric models are those that require the specification of some parameters before they can
be used.
• Non-parametric models do not rely on any specific parameter settings and therefore often
produce more accurate results.
• In case of parametric models, the assumption related to the functional form is made and linear
model is considered.
• In case of non-parametric models, the assumption about the functional form is not made.
• Parametric models are much easier to fit than non-parametric models because parametric
machine learning models only require the estimation of a set of parameters as the model is
identified prior as linear model.
• In case of non-parametric model, one needs to estimate some arbitrary function which is a
much difficult task.
• One can go for parametric models when the goal is to find result.
• One can go for non-parametric models when the goal is to make prediction.
Types of Regression:
• Linear Regression:
o It made up of linear variables. o It uses a regression line, also known as a best-fit line. o
It models the relationship between a single input independent variable and an output
dependent variable using a regression line.
o It is of two types: Simple: One dependent and one independent variable. Complex: One
dependent and multiple independent variables
o It is mostly used when the relationship to be modelled is not extremely complex and if you
don’t have a lot of data.
o Example: Let the relationship is defined as Y = c+m*X where ‘c’ denotes the intercept and
‘m’ denotes the slope of the line.
• Polynomial Regression:
o It is used for non-linear data and can model complex relationships. o In this regression
technique, the best fit line is a curve that fits into the data points. o Polynomial regression
analysis represents a non-linear relationship between dependent and independent variables.
o It requires careful design. Need some knowledge of the data in order to select the best
exponents.
• Logistic Regression:
o It is used when the dependent variable is discrete i.e., has only one of two values. Example:
0 or 1. o A sigmoid curve denotes the relation between the dependent variable and
independent variable. o It works best with large data sets that have an almost equal
occurrence of values in target variables.
o The dataset should not contain a high correlation between independent variables as this
will create a problem when ranking the variables.
• Locally weighted Regression:
o It is a supervised learning non-parametric algorithm.
o The model does not learn a fixed set of parameters. Rather parameters are computed
individually for each query. o While computing parameters, a higher preference is given to
the points in the training set.
Numerical optimization:
• For many problems it is hard to figure out the best solution directly. In such cases, Numerical
Optimization techniques are used.
• Using these optimization approaches, it is relatively easy to set up a function that measures
how good a solution is and then minimize the parameters of that function to find the solution.
• One such technique for Numerical optimization is Gradient Descent.
Gradient Descent:
• Gradient Descent is a numerical optimization algorithm.
• The main objective of using a gradient descent algorithm is to minimize the cost function
using iteration.
• It is also used for updating the parameters of the learning model.
• It is an iterative optimization algorithm which is used to train the various machine learning
and deep learning models.
• It helps in finding the local minimum of a function.
• The best way to define the local minimum or local maximum of a function using gradient
descent is as follows: o If we move towards a negative gradient or away from the gradient of
the function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the function at
the current point, we will get the local maximum of that function.
Cost-function:
Cost function is defined as the measurement of difference or error between actual values and
expected values at the current position and present in the form of a single real number.
Kernel methods:
• Kernel methods are sets of different types of algorithms that are being used for pattern
analysis.
• They are used to solve a non-linear problem by using a linear classifier.
• Kernels Methods are used in SVM (Support Vector Machines) which is used in
classification and regression problems.
Decision Tree:
• Decision tree is the most powerful and popular tool for classification and prediction.
• A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label.
• Example:
Suppose we have a sample of 14 patient data set and we have to predict which drug to suggest to the
patient A or B. Let’s say we pick cholesterol as the first attribute to split data. Then the tree is:
It will split our data into two branches High and Normal based on cholesterol. Let’s suppose our new
patient has high cholesterol by the above split of our data we cannot say whether Drug B or Drug A
will be suitable for the patient. Also, If the patient cholesterol is normal we still do not have an idea
or information to determine that either Drug A or Drug B is Suitable for the patient. Let us take
Another Attribute Age, as we can see age has three categories in it Young, middle age and senior
let’s try to split.
From the above figure, now we can say that we can easily predict which Drug to give to a patient
based on his or her reports.
Inductive Bias:
• Every machine learning model requires some type of architecture design and possibly some
initial assumptions about the data we want to analyze.
• Generally, every building block and every belief that we make about the data is a form of
inductive bias.
• Inductive bias are assumptions that are made by the learning algorithm to form a hypothesis or
a generalization beyond the set of training instances in order to classify unobserved data.
• Occam’s razor is one of the simplest examples of inductive bias
Evaluating hypotheses:
• Hypothesis in Machine Learning is used when in a Supervised Machine Learning, we need to
find the function that best maps input to output.
• Whenever you form a hypothesis for a given training data set, you have to test or evaluate how
accurate the considered hypothesis is. This is known as hypotheses evaluation.
Bayes’ Theorem
• Bayes Theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to that event.
• It is used where the probability of occurrence of a particular event is calculated based on other
conditions which are also called conditional probability.
• Example: There are 3 bags, each containing some white marbles and some black marbles in
each bag. If a white marble is drawn at random. With probability to find that this white marble
is from the first bag. In cases like such, we use the Bayes’ Theorem.
Problem 1: Bag I contains 3 red and 4 black balls while another Bag II contains 5 red and 6
black balls. One ball is drawn at random from one of the bags and it is found to be red. Find
the probability that it was drawn from Bag II.
Problem 2: In a factory which manufactures bolts, machines A, B and C manufacture
respectively 25%, 35% and 40% of the bolts. Of their outputs, 5, 4 and 2 percent are
respectively defective bolts. A bolt is drawn at random from the product and is found to be
defective. What is the probability that it is manufactured by the machine B?
Concept Learning:
• Concept learning is a task of acquiring potential hypothesis (solution) that best fits the given
training examples.
• Consider the example task of learning the target concept “days on which my friend Prabhas
enjoys his favorite water sport.”
• Below Table describes a set of example days, each represented by a set of attributes.
• The attribute EnjoySport indicates whether or not Prabhas enjoys his favorite water sport on
this day.
• The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the
values of its other attributes.
Bayes Optimal Classifier:
• Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for
a new example.
• It is described using the Bayes Theorem.
• It is closely related to the Maximum a Posteriori: a probabilistic framework referred to as
MAP that finds the most probable hypothesis for a training dataset.
• In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable to
calculate, and instead, simplifications such as the Gibbs algorithm and Naive Bayes can be
used to approximate the outcome.
Naive Bayes Classifiers:
• Naïve Bayes algorithm is a supervised learning algorithm
• It is based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.
o Each node
corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph. These
links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other.
Expectation-Maximization Algorithm:
• In the real-world applications of machine learning, it is very common that there are many
relevant features available for learning but only a small subset of them are observable.
• So, for the variables which are sometimes observable and sometimes not, then we can use the
instances when that variable is visible is observed for the purpose of learning and then predict
its value in the instances when it is not observable.
• Expectation-Maximization algorithm can be used for the latent variables (variables that are
not directly observable and are actually inferred from the values of the other observed
variables) too in order to predict their values with the condition that the general form of
probability distribution governing those latent variables is known to us.
• This algorithm is actually at the base of many unsupervised clustering algorithms in the field
of machine learning.
Algorithm:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.
Usage of EM algorithm:
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden Markov Model
(HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm:
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms of
implementation.
• Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm:
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical optimization requires
only forward probability).