CS3491AI & ML Lab Manual
CS3491AI & ML Lab Manual
SEMESTER IV
            LAB MANUAL
                  DHIRAJLAL GANDHI COLLEGE OF
                          TECHNOLOGY
                   Salem Airport (Opp.), Salem – 636 309 Ph.
                   (04290) 233333, www.dgct.ac.in
BONAFIDE CERTIFICATE
Name : …………………………………………………………
Degree : …………………………………………………………
Branch: …………………………………………………………
Certified that this is the bonafide record of the work done by the above student in
…………………………………………………………………………………………………………………….
Laboratory during the academic year      …………………………………
      ▪   Students must be present in proper dress code and wear the ID card.
      ▪   Students should enter the log-in and log-out time in the log
          register without fail.
      ▪   Students are not allowed to download pictures, music,
          videos or files without the permission of respective lab in-
          charge.
      ▪   Students should wear their own lab coats and bring observation
          note books to the laboratory classes regularly.
      ▪   Record of experiments done in a particular class should be
          submitted in the next lab class.
      ▪   Students who do not submit the record note book in time will not
          be allowed to do the next experiment and will not be given
          attendance for that laboratory class.
      ▪   Students will not be allowed to leave the laboratory until they
          complete the experiment.
      ▪   Students are advised to switch-off the Monitors and CPU when they
          leave the lab.
      ▪   Students are advised to arrange the chairs properly when they leave
          the lab.
                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
                                           College
Vision
      To improve the quality of human life through multi-disciplinary programs in
      Engineering, architecture and management that are internationally recognized
      and would facilitate research work to incorporate social economical and
      environmental development.
Mission
         To create a vibrant atmosphere that creates competent engineers, innovators,
         scientists, entrepreneurs, academicians and thinkers of tomorrow.
         To establish centers of excellence that provides sustainable solutions to industry
         and society.
         To enhance capability through various values added programs so as to meet the
         challenges of dynamically changing global needs.
                                        Department
Vision
      To cultivate creative, globally competent, employable and disciplined computing
      professionals with the spirit of benchmarking educational system that promotes
      academic excellence, scientific pursuits, entrepreneurship and professionalism.
Mission
   ● To develop the creators of tomorrow’s technology to meet the social needs of
      our nation.
   ● To promote and encourage the strength of research in Engineering, Science
      and Technology.
    ● To channel the gap between Academia, Industry and Society.
                The Graduates of the program would constantly learn and update the
 PEO1           knowledge in the emerging fields of technology.
                             Program Outcomes(POs)
       To apply knowledge of mathematics, science, engineering fundamentals and
PO1    computer science theory to solve the complex problems in Computer Science
       and Engineering.
       To analyze problems, identify and define the solutions using basic principles of
PO2
       mathematics, science, technology and computer engineering.
       To design, implement, and evaluate computer based systems, processes,
PO3    components, or software to meet the realistic constraints for the public health
       and safety, and the cultural, societal and environmental considerations.
       To design and conduct experiments, perform analysis & interpretation and
PO4
       provide valid conclusions with the use of research-based knowledge and
       research methodologies related to Computer Science and Engineering.
       To propose innovative original ideas and solutions, culminating into modern
PO5
       engineering products for a large section of the society with longevity.
       To apply the understanding of legal, health, security, cultural & social issues,
PO6    and thereby ones responsibility in their application in
            Professional Engineering practices.
COURSE OUTCOME
Mapping
     CO1           3    2   2    -     -    -   -      -     2      1      3     1      2       3
                                                                                                        2
CO2 1 3 2 3 3 - - - 2 3 2 2 3 2 1
     CO3
                   3    3   2    1     1    -   -      -     1      -      1     3      3       2       1
CO4 3 1 2 1 3 - - - 1 - 2 1 1 3 2
CO5 3 1 1 1 1 - - - 1 1 2 1 2 1 2
LIST OF EXPERIMENTS:
OUTCOMES: Upon completion of the course, the students will be able to:
CONTENTS
             Implementation of Uninformed
 1           search algorithms (BFS,DFS)
             Implementation of Informed
 2           search algorithms (A*,memory-
             bounded A*)
SCORED: LAB-IN-CHARGE:
                                         13
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
14
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of implementing Breadth-First Search (BFS) algorithms is to traverse a graph or a tree data
structure in a systematic way, visiting all nodes and edges in the structure in a particular order, without
revisiting any node twice.
Program:
# Breadth-First Search (BFS) algorithm
graph = {
'5' : ['3','7'],
'3' : ['2', '4'],
'7' : ['8'],
'2' : [],
'4' : ['8'],
'8' : []
}
                                                15
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# Driver Code
print("Following is the Breadth-First Search")
bfs(visited, graph, '5')
Output:
Following is the Breadth-First Search
537248
Result:
Thus the uninformed search algorithms Breadth-First Search (BFS) have been executed successfully and
the output got verified.
Viva Questions:
1. Breadth First Search is equivalent to which of the traversal in the Binary Trees?
a) Pre-order Traversal       b) Post-order Traversal
c) Level-order Traversal     d) In-order Traversal
2. The Data structure used in standard implementation of Breadth First Search is?
a) Stack             b) Queue      c) Linked List      d) Tree
6. Regarding implementation of Breadth First Search using queues, what is the maximum distance
between two nodes present in the queue? (considering each edge length 1)
a) Can be anything        b) 0          c) At most 1         d) Insufficient Information
8. A person wants to visit some places. He starts from a vertex and then wants to visit every place
connected to this vertex and so on. What algorithm he should use?
a) Depth First Search        b) Breadth First Search
c) Trim’s algorithm          d)Kruskal’s algorithm
9. The Breadth First Search algorithm has been implemented using the queue data structure. One possible
order of visiting the nodes of the following graph is
                                             16
                                                                    DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise:
1.     Develop a code by implementing the Uninformed search algorithm- BFS
2.     Develop a code by implementing the 8 puzzles using the BFS.
3.     Implementation of Breadth First Search for Tic-Tac-Toe Problem
4.     Write a program to implement Towers of Hanoi problem.
5.              A---B
|\ |
| \ |
| \|
C---D
Write a Python program to perform a Breadth-First Search on the above graph starting from vertex ‘A’.
6.                 A
  / \
 B C
/ \ \
D E F
Write a Python program to perform a Breadth-First Search on this graph starting from vertex ‘A’.
7.     Write a python program to implement BFS the graph is implemented as an adjacency list.
8.     Write a program to implement the Uninformed strategy – Breadth-First Search considering the
following graph
graph = {'Q': ['P', 'C'], 'R':['D'], 'C':[], 'P':[]}
9.     Write a program to implement the Uninformed strategy – Uniform Search
                                           17
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
18
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
19
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Program:
# Depth-First Search (DFS) algorithm
graph = {
'5' : ['3','7'],
'3' : ['2', '4'],
'7' : ['8'],
'2' : [],
'4' : ['8'],
'8' : []
}
# Driver Code
print("Following is the Depth-First Search")
dfs(visited, graph, '5')
Output:
Following is the Depth-First Search
5
3
2
4
8
7
                                               20
                                                                            DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                         DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
       Thus the uninformed search algorithms Depth-First Search (DFS) have been executed successfully and
the output got verified.
Viva Questions:
1. Depth First Search is equivalent to which of the traversal in the Binary Trees?
a) Pre-order Traversal         b) Post-order Traversal
c) Level-order Traversal       d) In-order Traversal
3. The Data structure used in standard implementation of Breadth First Search is?
a) Stack              b) Queue       c) Linked List d) Tree
5. A person wants to visit some places. He starts from a vertex and then wants to visit every vertex till it
finishes from one vertex, backtracks and then explore other vertex from same vertex. What algorithm he
should use?
a) Depth First Search b) Breadth First Search
c) Trim’s algorithm          d) Kruskal’s Algorithm
8. Regarding implementation of Depth First Search using stacks, what is the maximum distance between two
nodes present in the stack? (Considering each edge length 1)
a) Can be anything b) 0             c) At most 1 d) Insufficient Information
10. Is following statement true/false If a DFS of a directed graph contains a back edge, any other DFS of the
same graph will also contain at least one back edge.
a) True        b) False
                                                21
                                                                      DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                   DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise:
1. Develop a code by implementing the Uninformed search algorithm- DFS
2. Develop a code to implement the DFS of a large dataset using the maximum recursion by using DFS
3. Implementation of Depth First Search for Water Jug Problem
4. Write a Python program to implement Depth-First Search using a tree.
5. Write a Python program to perform a DFS traversal starting in a graph and show the order of visited
   vertices.
6. Write a Python program to find the articulation points of the graph using Depth-First Search.
7. Given the following adjacency matrix:
   0110
   1001
   1001
   0110
   Perform a Depth-First Search on this graph starting from vertex ‘0’.
8. Write a program to implement the Uninformed strategy – Depth First Search considering the following
   graph
   graph = {'A': ['B', 'C'], 'B':['D'], 'C':[], 'D':[]}
9. Write a Program to Implement Monkey Banana Problem using Python.
                                            22
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
23
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
24
                                                                         DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                      DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of a C program for implementing informed search algorithms like A* and memory-bounded A* is to
efficiently find the shortest path between two points in a graph or network. The A* algorithm is a heuristic-
based search algorithm that finds the shortest path between two points by evaluating the cost function of each
possible path.
Program:
from queue import PriorityQueue
v =14
graph =[[] for i in range(v)]
                                                25
                                                                         DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                      DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
   if visited[v] ==False:
     visited[v] =True
     pq.put((c, v))
  print()
source =0
target =9
best_first_search(source, target, v)
OUTPUT:
A* :
0
1
3
2
8
9
Result:
Thus the above program executed successfully.
Viva Questions:
1. Which data structure is typically used to implement the open and closed lists in the A* search algorithm?
A. Queue      B. Stack                 C. Set       D. Priority Queue
2. Which of the following best describes the heuristic function in the A* search algorithm?
A. A function that assigns weights to the edges of the graph.
B. A function that estimates the cost from the current node to the goal node.
                                               26
                                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                        DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
C. A function that selects the next node to expand based on the lowest path cost.
D. A function that checks if the goal node has been reached.
4. Which of the following best describes the admissibility property in relation to the heuristic function in A*
search algorithm?
A. The heuristic function never overestimates the actual cost to reach the goal node.
B. The heuristic function always underestimates the actual cost to reach the goal node.
C. The heuristic function provides an accurate estimate of the actual cost to reach the goal node.
D. The heuristic function does not affect the search process in A* algorithm.
6. Which of the following conditions can lead to an optimal path in the A* search algorithm?
A. The heuristic function is admissible, but not consistent.
B. The heuristic function is both admissible and consistent.
C. The heuristic function is consistent, but not admissible.
D. The heuristic function is neither admissible nor consistent.
8. Which of the following scenarios can cause the A* search algorithm to return a suboptimal path?
A. The heuristic function is admissible but not consistent.
B. The heuristic function is consistent but not admissible.
C. The search space contains cycles.
D. The heuristic function is neither admissible nor consistent.
9. Which of the following techniques can be used to improve the efficiency of the A* search algorithm?
A. Increasing the size of the open list.
B. Using an effective heuristic function.
C. Randomizing the order of expanding nodes.
D. Ignoring the closed list.
10. Which of the following search algorithms is a generalization of the A* search algorithm and guarantees
finding an optimal path even with inconsistent heuristic functions?
A. Depth-First Search (DFS)
                                                 27
                                                                         DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                      DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
B. Breadth-First Search (BFS)
C. Iterative-Deepening A* (IDA*)
D. Uniform Cost Search (UCS)
Practice Exercise:
   1. Develop a code by implementing the Informed search algorithm- A*
   2. Develop a code using the repository of UCI Dataset and perform the Informed search algorithm- A*
   3. Write the program to find the shortest path from `start` to `goal` in a `graph` by means of A* algorithm.
   4. Write a program to implement the A* algorithm to find the shortest path from source to all vertices.
   5. Write a program to implement Hill Climbing algorithm.
                                              28
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
29
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
30
                                                                             DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                          DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of a C program for implementing informed search algorithms like memory-bounded A* is to efficiently
find the shortest path between two points in a graph or network. The memory bounded A* algorithm is a
variant of the A* algorithm that uses a limited amount of memory and is suitable for large search spaces.
Program:
#Memory Bounded A *
class Graph:
   def __init__(self, graph, heuristicNodeList, startNode): #instantiate graph object with graph topology,
heuristic values, start node
     self.graph = graph
     self.H=heuristicNodeList
     self.start=startNode
     self.parent={}
     self.status={}
     self.solutionGraph={}
                                                 31
                                                                             DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                          DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
      self.H[n]=value # set the revised heuristic value of a given node
    def printSolution(self):
      print("FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE:",self.start)
      print("------------------------------------------------------------")
      print(self.solutionGraph)
      print("------------------------------------------------------------")
    def computeMinimumCostChildNodes(self, v): # Computes the Minimum Cost of child nodes of a given node
v
     minimumCost=0
     costToChildNodeListDict={}
     costToChildNodeListDict[minimumCost]=[]
     flag=True
     for nodeInfoTupleList in self.getNeighbors(v): # iterate over all the set of child node/s
        cost=0
        nodeList=[]
        for c, weight in nodeInfoTupleList:
           cost=cost+self.getHeuristicNodeValue(c)+weight
           nodeList.append(c)
        if flag==True: # initialize Minimum Cost with the cost of first set of child node/s
           minimumCost=cost
           costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
           flag=False
        else: # checking the Minimum Cost nodes with the current Minimum Cost
           if minimumCost>cost:
              minimumCost=cost
              costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
     return minimumCost, costToChildNodeListDict[minimumCost] # return Minimum Cost and Minimum Cost
child node/s
    def aoStar(self, v, backTracking): # AO* algorithm for a start node and backTracking status flag
      print("HEURISTIC VALUES :", self.H)
      print("SOLUTION GRAPH :", self.solutionGraph)
      print("PROCESSING NODE :", v)
      print("-----------------------------------------------------------------------------------------")
      if self.getStatus(v) >= 0: # if status node v >= 0, compute Minimum Cost nodes of v
         minimumCost, childNodeList = self.computeMinimumCostChildNodes(v)
         print(minimumCost, childNodeList)
         self.setHeuristicNodeValue(v, minimumCost)
         self.setStatus(v,len(childNodeList))
         solved=True # check the Minimum Cost nodes of v are solved
         for childNode in childNodeList:
            self.parent[childNode]=v
            if self.getStatus(childNode)!=-1:
               solved=solved & False
         if solved==True: # if the Minimum Cost nodes of v are solved, set the current node status as solved(-1)
            self.setStatus(v,-1)
            self.solutionGraph[v]=childNodeList # update the solution graph with the solved nodes which may be
                                                 32
                                                                                            DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                                         DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
a part of solution
       if v!=self.start: # check the current node is the start node for backtracking the current node value
          self.aoStar(self.parent[v], True) # backtracking the current node value with backtracking status set to
true
       if backTracking==False: # check the current call is not for backtracking
          for childNode in childNodeList: # for each Minimum Cost child node
             self.setStatus(childNode,0) # set the status of child node to 0(needs exploration)
             self.aoStar(childNode, False) # Minimum Cost child node is further explored with backtracking
status as false
G2 = Graph(graph2, h2, 'A') # Instantiate Graph object with graph, heuristic values and start Node
G2.applyAOStar() # Run the AO* algorithm
G2.printSolution() # Print the solution graph as output of the AO* algorithm search
Output:
Graph - 1
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
                                                            33
                                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
-----------------------------------------------------------------------------------------
6 ['G']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : G
-----------------------------------------------------------------------------------------
8 ['I']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
-----------------------------------------------------------------------------------------
8 ['H']
HEURISTIC VALUES : {'A': 10, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
12 ['B', 'C']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : I
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': []}
PROCESSING NODE : G
-----------------------------------------------------------------------------------------
1 ['I']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I']}
PROCESSING NODE : B
-----------------------------------------------------------------------------------------
2 ['G']
HEURISTIC VALUES : {'A': 12, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : C
-----------------------------------------------------------------------------------------
2 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
                                                        34
                                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : J
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': []}
PROCESSING NODE : C
-----------------------------------------------------------------------------------------
1 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 1, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
5 ['B', 'C']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
------------------------------------------------------------
{'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J'], 'A': ['B', 'C']}
------------------------------------------------------------
Graph - 2
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
10 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : E
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
6 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
                                                        35
                                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
7 ['D']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : F
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': []}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
2 ['E', 'F']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 2, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': [], 'D': ['E', 'F']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
3 ['D']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
------------------------------------------------------------
{'E': [], 'F': [], 'D': ['E', 'F'], 'A': ['D']}
Result:
Thus the above program executed successfully.
Viva Questions:
1. What is the other name of informed search strategy?
a) Simple search      b) Heuristic search c) Online search                      d) None of the mentioned
3. Which search uses the problem specific knowledge beyond the definition of the problem?
a) Informed search                          b) Depth-first search
c) Breadth-first search                     d) Uninformed search
4. Which function will select the lowest expansion node at first for evaluation?
a) Greedy best-first search                  b) Best-first search
c) Depth-first search                        d) None of the mentioned
                                                        36
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
7. Which method is used to search better by learning?
a) Best-first search                       b) Depth-first search
c) Metalevel state space                   d) None of the mentioned
10. Which search method will expand the node that is closest to the goal?
a) Best-first search                      b) Greedy best-first search
c) A* search                              d) None of the mentioned
Practice Exercise:
1. Develop a code by implementing the Informed search algorithm- Memory Bounded A*
2. Implement the code for accessing the Basketball Logo through Informed search algorithm- Memory
   Bounded A*
3. Write a Program to Implement N-Queens Problem using Python
4. Consider the following grid map, where each cell is either passable (0) or blocked (1):
   000100000
   000100000
   000100000
   000100000
   000000000
   000000000
   Write the program to find the shortest path from the start to the goal using A* algorithm.
   Note: Moves can in any of the four cardinal directions (up, down, left, right) but not diagonally.
   The start position is (0, 0) and the goal position is (5, 8).
                                              37
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
38
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
39
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
40
                                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                        DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
IMPLEMENT NAVIBAYES
 Aim:
The aim of the Naïve Bayes algorithm is to classify a given set of data points into different classes based on the
probability of each data point belonging to a particular class. This algorithm is based on the Bayes theorem,
which states that the probability of an event occurring given the prior knowledge of another event can be
calculated using conditional probability.
Algorithm:
1. Collect the dataset: The first step in using Naïve Bayes is to collect a dataset that contains a set of data
    points and their corresponding classes.
2. Prepare the data: The next step is to preprocess the data and prepare it for the Naïve Bayes algorithm.
    This involves removing any unnecessary features or attributes and normalizing the data.
2. Compute the prior probabilities: The prior probabilities of each class can be computed by calculating the
    number of data points belonging to each class and dividing it by the total number of data points.
3. Compute the likelihoods: The likelihoods of each feature for each class can be computed by calculating the
    conditional probability of the feature given the class. This involves counting the number of data points in
    each class that have the feature and dividing it by the total number of data points in that class.
3. Compute the posterior probabilities: The posterior probabilities of each class can be computed by
    multiplying the prior probability of the class with the product of the likelihoods of each feature for that
    class.
4. Make predictions: Once the posterior probabilities have been computed for each class, the Naïve Bayes
    algorithm can be used to make predictions by selecting the class with the highest probability.
5. Evaluate the model: The final step is to evaluate the performance of the Naïve Bayes model. This can be
    done by computing various performance metrics such as accuracy, precision, recall, and F1 score.
Program:
# load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
                                                41
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)
Output:
Gaussian Naive Bayes model accuracy(in %): 95.0
Result:
Thus the program for Navy Bayes is verified successfully and output is verified.
Viva Questions:
1. Which of the following statements best describes Naive Bayes Algorithm?
a) It is a supervised learning algorithm used for classification.
b) It is an unsupervised learning algorithm used for clustering.
c) It is a reinforcement learning algorithm used for decision making.
d) It is a dimensionality reduction algorithm used for feature extraction.
2. What assumption does Naive Bayes Algorithm make regarding the independence of features?
a) Conditional independence b) Mutual independence c) Dependence d) None of the above
3. Which probability distribution is commonly used for modeling the likelihood in Naive Bayes Algorithm?
a) Normal distribution                 b) Uniform distribution
c) Poisson distribution                d) Bernoulli distribution
5. Which assumption is violated by Naive Bayes Algorithm if there is a high degree of interdependence among
the features?
a) Linearity assumption             b) Normality assumption
c) Independence assumption d) Homoscedasticity assumption
6. Which variant of Naive Bayes Algorithm is suitable for handling continuous-valued features?
a) Gaussian Naive Bayes             b) Multinomial Naive Bayes
c) Complement Naive Bayes           d) Bernoulli Naive Bayes
                                               42
                                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                        DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
b) Estimating the conditional probabilities of each feature given the class
c) Combining the prior and conditional probabilities
d) All of the above
8. What problem can occur in Naive Bayes Algorithm if a particular feature has zero probability in the training
dataset for a certain class?
a) Overfitting    b) Underfitting  c) Zero-frequency problem d) Class imbalance problem
9. Which evaluation metric is commonly used to assess the performance of Naive Bayes Algorithm for
classification tasks?
a) Mean Absolute Error (MAE)     b) Root Mean Squared Error (RMSE)
c) F1 score                      d) R-squared (R^2) score
Practice Exercise:
1. Develop a code by implementing the Analyzation of data set using naïve Bayes models
2. Develop a code to implement the Gaussian naïve Bayes models for the spam filtering process.
3. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
   this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
   recall for your data set.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set stored as a .CSV
   file. Compute the accuracy of the classifier, considering few test data sets.
5. Write a python program to implement a Naive Bayes classifier using scikit-learn library
6. Write a python program to implement Gaussian naïve bayes models
7. Write a python program to implement Bernoulli naïve bayes models
8. Write a python program to implement Multinomial naïve bayes models
9. Write a program to implement Naive Bayes models for the following problem Assume we have to find the
   probability of the randomly picked card to be king given that it is a face card.
                                                43
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
44
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
45
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
46
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of implementing Bayesian Networks is to model the probabilistic relationships between a set of
variables. A Bayesian Network is a graphical model that represents the conditional dependencies between
different variables in a probabilistic manner. It is a powerful tool for reasoning under uncertainty and can be
used for a wide range of applications, including decision making, risk analysis, and prediction.
Algorithm:
1. Define the variables: The first step in implementing a Bayesian Network is to define the variables that will
   be used in the model. Each variable should be clearly defined and its possible states should be enumerated.
2. Determine the relationships between variables: The next step is to determine the probabilistic relationships
   between the variables. This can be done by identifying the causal relationships between the variables or by
   using data to estimate the conditional probabilities of each variable given its parents.
3. Construct the Bayesian Network: The Bayesian Network can be constructed by representing the variables
   as nodes in a directed acyclic graph (DAG). The edges between the nodes represent the conditional
   dependencies between the variables.
4. Assign probabilities to the variables: Once the structure of the Bayesian Network has been defined, the
   probabilities of each variable must be assigned. This can be done by using expert knowledge, data, or a
   combination of both.
5. Inference: Inference refers to the process of using the Bayesian Network to make predictions or draw
   conclusions. This can be done by using various inference algorithms, such as variable elimination or belief
   propagation.
6. Learning: Learning refers to the process of updating the probabilities in the Bayesian Network based on
   new data. This can be done using various learning algorithms, such as maximum likelihood or Bayesian
   learning.
7. Evaluation: The final step in implementing a Bayesian Network is to evaluate its performance. This can be
   done by comparing the predictions of the model to actual data and computing various performance
   metrics, such as accuracy or precision.
                                              47
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Database: 0 1 2 3 4 Total
Attribute Information:
Program:
#install
!pip install pgmpy
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
                                              48
                                                                            DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                         DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
print(heartDisease.head())
model=
BayesianNetwork([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heartdisease'),('hear
tdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
Output:
     oldpeak        slope ca thal            heartdisease
0        2.3            3 0     6                       0
1        1.5            2 3     3                       2
2        2.6            2 2     7                       1
3        3.5            3 0     3                       0
4        1.4            1 0     3                       0
                                                49
                                                                         DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
thal                       object
heartdisease                int64
dtype: object
Result:
Thus the program to implement a Bayesian Networks in the given heart disease dataset have been executed
successfully and the output got verified.
Viva Questions:
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables                             b) Using information
c) Both Using variables & information          d) None of the mentioned
10. What is the consequence between a node and its predecessors while creating bayesian network?
a) Functionally dependent                b) Dependant
c) Conditionally independent      d) Both Conditionally dependant & Dependant
Practice Exercise:
1. Write a program to implement Bayesian Network that will model the performance of a student on an
   exam.
2. Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate
   the diagnosis of heart patients using standard Heart Disease Data Set. You can use Python ML library
   classes/API
3. Write a python program to create a simple Bayesian network using pgmpy.
4. Write a python program to implement the EM algorithm for Bayesian networks in Python
5. Write a python program using the K2 algorithm for learning the structure of a Bayesian network
6. Develop a code to implement the Bayesian Networks for performing the Iteration process and Analyze the
   random networks.
7. Write a EM code for understand the heart diseases and implement using the Bayesian Networks.
8. Develop a code by implementing the probability relationship check between two dataset using Bayesian
   Networks
                                                  51
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
52
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
53
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
54
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To build regression models such as locally weighted linear regression and plot the necessary graphs.
Algorithm:
1. Read the Given data Sample to X and the curve (linear or non-linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :
5. Determine the value of model term parameter β using :
6. Prediction = x0*β.
Program:
                                             55
                                                                      DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                   DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
   return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
import matplotlib.pyplot as plt
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")
Output
Result:
Thus the program to implement non-parametric Locally Weighted Regression algorithm in order to fit
data points with a graph visualization have been executed successfully.
VIVA Questions
   1. Which one of the following statements about the correlation coefficient is correct?
      🗸 The correlation coefficient is unaffected by scale changes.
      🗸 Both the change of scale and the change of origin have no effect on the correlation coefficient.
      🗸 The correlation coefficient is unaffected by the change of origin.
      🗸 The correlation coefficient is affected by changes of origin and scale.
   2. Choose the correct option concerning the correlation analysis between 2 sets of data.
      🗸 Multiple correlations is a correlational analysis comparing two sets of data.
      🗸 A partial correlation is a correlational analysis comparing two sets of data.
      🗸 A simple correlation is a correlational analysis comparing two sets of data.
      🗸 None of the preceding.
                                            56
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercises
   1. Develop a code to understand and predict an outcome variable based on the input
      Regression models.
   2. Predict an outcome of the number of customer increased by analyzing through the regression
      model.
   3. Write a python program to implement Simple Linear Regression and plot the graph.
                                              57
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
58
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
59
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
60
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the concept of decision trees with suitable dataset from real world problems using CART
algorithm.
Algorithm:
Steps in CART algorithm:
Program
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_csv('Social_Network_Ads.csv')
data.head()
feature_cols = ['Age', 'EstimatedSalary']
x = data.iloc[:, [2, 3]].values
y = data.iloc[:, 4].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier = classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
                                               61
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = np.meshgrid(np.arange(start=x_set[:, 0].min()-1, stop=x_set[:, 0].max()+1, step=0.01),
np.arange(start=x_set[:, 1].min()-1, stop=x_set[:, 1].max()+1, step=0.01))
plt.contourf(x1,x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape), alpha=0.75,
cmap=ListedColormap(("red", "green")))
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
 plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c=ListedColormap(("red", "green"))(i), label=j)
plt.title("Decision Tree(Test set)")
plt.xlabel("Age")
plt.ylabel("Estimated Salary")
plt.legend()
plt.show()
from sklearn.tree import export_graphviz
from six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data, filled=True, rounded=True, special_characters=True,
feature_names=feature_cols, class_names=['0', '1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.write_png('decisiontree.png'))
classifier = DecisionTreeClassifier(criterion="gini", max_depth=3)
classifier = classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data, filled=True, rounded=True, special_characters=True,
feature_names=feature_cols, class_names=['0', '1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.write_png('opt_decisiontree_gini.png'))
                                             62
                                                                   DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
      Thus the program to implement the concept of decision trees with suitable dataset from real world
problems using CART algorithm have been executed successfully.
VIVA Questions
                                              63
                                                                     DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                  DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
      🗸 Decision Node
      🗸Path
      🗸 Arc/Edge
  4. Increase in Training time will tends to
       🗸 Decreased of Size
       🗸Increased of Size
       🗸 Constant Size
       🗸 None of the above.
  5. For each Split the number of random attributes tested are tends to be
       🗸Sensitive
       🗸 Insensitive
       🗸Fairly insensitive
       🗸None of the above
Practice Exercises
   1. Develop a code to build random forests for the dataset by understand the difference between
      Random and Decision Tree.
   2. Develop a code to understand the risk to prevent the heart attack using the Decision Trees.
   3. Write a python program to build decision tree regression using scikit-learn library.
                                               64
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
65
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
66
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To create a machine learning model which classifies the Spam and Ham E-Mails from a given dataset using
Support Vector Machine algorithm.
Algorithm:
Program:
       import pandas as pd
       import numpy as np
       import matplotlib.pyplot as plt
       import seaborn as sns
       import string
       from nltk.corpus import stopwords
       import os
       from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
       from PIL import Image
       from sklearn.feature_extraction.text import CountVectorizer
       from sklearn.model_selection import train_test_split
       from sklearn.metrics import classification_report, confusion_matrix
       from sklearn.naive_bayes import MultinomialNB
       from sklearn.metrics import roc_curve, auc
       from sklearn import metrics
       from sklearn import model_selection
       from sklearn import svm
       from nltk import word_tokenize
       from sklearn.metrics import roc_auc_score
       from matplotlib import pyplot
       from sklearn.metrics import ConfusionMatrixDisplay
       class data_read_write(object):
        def __init__(self):
         pass
        def __init__(self, file_link):
                                             67
                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
  self.data_frame = pd.read_csv(file_link)
 def read_csv_file(self, file_link):
  return self.data_frame
 def write_to_csvfile(self, file_link):
  self.data_frame.to_csv(file_link, encoding='utf-8', index=False, header=True)
  return
class generate_word_cloud(data_read_write):
 def __init__(self):
  pass
 def variance_column(self, data):
  return np.variance(data)
 def word_cloud(self, data_frame_column, output_image_file):
  text = " ".join(review for review in data_frame_column)
  stopwords = set(STOPWORDS)
  stopwords.update(["subject"])
  wordcloud = WordCloud(width = 1200, height = 800, stopwords=stopwords,
  max_font_size = 50, margin=0,
  background_color = "white").generate(text)
  plt.imshow(wordcloud, interpolation='bilinear')
  plt.axis("off")
  plt.savefig("Distribution.png")
  plt.show()
  wordcloud.to_file(output_image_file)
  return
 class data_cleaning(data_read_write):
  def __init__(self):
   pass
  def message_cleaning(self, message):
   Test_punc_removed = [char for char in message if char not in string.punctuation]
   Test_punc_removed_join = ''.join(Test_punc_removed)
   Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split()
   if word.lower() not in stopwords.words('english')]
   final_join = ' '.join(Test_punc_removed_join_clean)
   return final_join
  def apply_to_column(self, data_column_text):
   data_processed = data_column_text.apply(self.message_cleaning)
   return data_processed
 class apply_embeddding_and_model(data_read_write):
  def __init__(self):
   pass
  def apply_count_vector(self, v_data_column):
   vectorizer = CountVectorizer(min_df=2, analyzer="word", tokenizer=None,
preprocessor=None, stop_words=None)
   return vectorizer.fit_transform(v_data_column)
  def apply_svm(self, X, y):
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
   params = {'kernel': 'linear', 'C': 2, 'gamma': 1}
   svm_cv = svm.SVC(C=params['C'], kernel=params['kernel'], gamma=params['gamma'],
probability=True)
   svm_cv.fit(X_train, y_train)
   y_predict_test = svm_cv.predict(X_test)
                                 68
                                                             DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                          DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
   cm = confusion_matrix(y_test, y_predict_test)
   sns.heatmap(cm, annot=True)
   print(classification_report(y_test, y_predict_test))
   print("test set")
   print("\nAccuracy Score: " + str(metrics.accuracy_score(y_test, y_predict_test)))
   print("F1 Score: " + str(metrics.f1_score(y_test, y_predict_test)))
   print("Recall: " + str(metrics.recall_score(y_test, y_predict_test)))
   print("Precision: " + str(metrics.precision_score(y_test, y_predict_test)))
   class_names = ['ham', 'spam']
   titles_options = [("Confusion matrix, without normalization", None),
   ("Normalized confusion matrix", 'true')]
   for title, normalize in titles_options:
    disp = plot_confusion_matrix(svm_cv, X_test,
y_test,display_labels=class_names,cmap=plt.cm.Blues,normalize=normalize)
    disp.ax_.set_title(title)
    print(title)
    print(disp.confusion_matrix)
   plt.savefig("SVM.png")
   plt.show()
   ns_probs = [0 for _ in range(len(y_test))]
   lr_probs = svm_cv.predict_proba(X_test)
   lr_probs = lr_probs[:, 1]
   ns_auc = roc_auc_score(y_test, ns_probs)
   lr_auc = roc_auc_score(y_test, lr_probs)
   print('No Skill: ROC AUC=%.3f' % (ns_auc))
   print('SVM: ROC AUC=%.3f' % (lr_auc))
   ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
   lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
   pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
   pyplot.plot(lr_fpr, lr_tpr, marker='.', label='SVM')
   pyplot.xlabel('False Positive Rate')
   pyplot.ylabel('True Positive Rate')
   pyplot.legend()
   pyplot.savefig("SVMMat.png")
   pyplot.show()
   return
data_obj = data_read_write("emails.csv")
data_frame = data_obj.read_csv_file("processed.csv")
data_frame.head()
data_frame.tail()
data_frame.describe()
data_frame.info()
data_frame.head()
data_frame.groupby('spam').describe()
data_frame['length'] = data_frame['text'].apply(len)
data_frame['length'].max()
sns.set(rc={'figure.figsize':(11.7,8.27)})
ham_messages_length = data_frame[data_frame['spam']==0]
spam_messages_length = data_frame[data_frame['spam']==1]
ham_messages_length['length'].plot(bins=100, kind='hist',label = 'Ham')
spam_messages_length['length'].plot(bins=100, kind='hist',label = 'Spam')
                                   69
                                                      DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                   DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
plt.title('Distribution of Length of Email Text')
plt.xlabel('Length of Email Text')
plt.legend()
data_frame[data_frame['spam']==0].text.values
ham_words_length = [len(word_tokenize(title)) for title in
data_frame[data_frame['spam']==0].text.values]
spam_words_length = [len(word_tokenize(title)) for title in
data_frame[data_frame['spam']==1].text.values]
print(max(ham_words_length))
print(max(spam_words_length))
sns.set(rc={'figure.figsize':(11.7,8.27)})
ax = sns.distplot(ham_words_length, norm_hist = True, bins = 30, label = 'Ham')
ax = sns.distplot(spam_words_length, norm_hist = True, bins = 30, label = 'Spam')
plt.title('Distribution of Number of Words')
plt.xlabel('Number of Words')
plt.legend()
plt.savefig("SVMGraph.png")
plt.show()
def mean_word_length(x):
 word_lengths = np.array([])
 for word in word_tokenize(x):
  word_lengths = np.append(word_lengths, len(word))
  return word_lengths.mean()
ham_meanword_length =data_frame[data_frame['spam']==0].text.apply(mean_word_length)
spam_meanword_length =data_frame[data_frame['spam']==1].text.apply(mean_word_length)
sns.distplot(ham_meanword_length, norm_hist = True, bins = 30, label = 'Ham')
sns.distplot(spam_meanword_length , norm_hist = True, bins = 30, label = 'Spam')
plt.title('Distribution of Mean Word Length')
plt.xlabel('Mean Word Length')
plt.legend()
plt.savefig("Graph.png")
plt.show()
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
def stop_words_ratio(x):
 num_total_words = 0
 num_stop_words = 0
 for word in word_tokenize(x):
  if word in stop_words:
    num_stop_words += 1
    num_total_words += 1
    return num_stop_words / num_total_words
ham_stopwords = data_frame[data_frame['spam'] == 0].text.apply(stop_words_ratio)
spam_stopwords = data_frame[data_frame['spam'] == 1].text.apply(stop_words_ratio)
sns.distplot(ham_stopwords, norm_hist=True, label='Ham')
sns.distplot(spam_stopwords, label='Spam')
print('Ham Mean: {:.3f}'.format(ham_stopwords.values.mean()))
print('Spam Mean: {:.3f}'.format(spam_stopwords.values.mean()))
plt.title('Distribution of Stop-word Ratio')
plt.xlabel('Stop Word Ratio')
plt.legend()
                                70
                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
ham = data_frame[data_frame['spam']==0]
spam = data_frame[data_frame['spam']==1]
spam['length'].plot(bins=60, kind='hist')
ham['length'].plot(bins=60, kind='hist')
data_frame['Ham(0) and Spam(1)'] = data_frame['spam']
print( 'Spam percentage =', (len(spam) / len(data_frame) )*100,"%")
print( 'Ham percentage =', (len(ham) / len(data_frame) )*100,"%")
sns.countplot(data_frame['Ham(0) and Spam(1)'], label = "Count")
data_clean_obj = data_cleaning()
data_frame['clean_text'] = data_clean_obj.apply_to_column(data_frame['text'])
data_frame.head()
data_obj.data_frame.head()
data_obj.write_to_csvfile("processed_file.csv")
cv_object = apply_embedding_and_model()
spamham_countvectorizer = cv_object.apply_count_vector(data_frame['clean_text'])
X = spamham_countvectorizer
label = data_frame['spam'].values
y = label
cv_object.apply_svm(X,y)
Output:
test set
Accuracy Score: 0.9895287958115183
F1 Score: 0.9776119402985075
Recall: 0.9739776951672863
Precision: 0.9812734082397003
Normalized confusion matrix
[[0.99429875 0.00570125]
[0.0260223 0.9739777 ]]
                                    71
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
72
                                                                    DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to create a machine learning model which classifies the Spam and Ham E-Mails from a
given dataset using Support Vector Machine algorithm have been successfully executed.
VIVA Questions
Practice Exercises
  1. Write a python program to build SVM (Support Vector Machine) models using scikit-learn.
  2. Write a program to implement the SVM using the following dataset
     https://www.kaggle.com/mltuts/social- network-ads.
  3. Write a python program to implement Agglomerative Hierarchical Clustering, using Python and
     scikit-learn library.
                                            74
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
75
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
76
                                                                            DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                         DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the ensembling technique of Blending with the given Alcohol QCM Dataset.
Algorithm:
1. Split the training dataset into train, test and validation dataset.
2. Fit all the base models using train dataset.
3. Make predictions on validation and test dataset.
4. These predictions are used as features to build a second level model
5. This model is used to make predictions on test and meta-features.
Program:
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
df = pd.read_csv("train_data.csv")
target = df["target"]
train = df.drop("target")
X_train, X_test, y_train, y_test = train_test_split(train, target, test_size=0.20)
train_ratio = 0.70
validation_ratio = 0.20
test_ratio = 0.10
x_train, x_test, y_train, y_test = train_test_split(
train, target, test_size=1 - train_ratio)
x_val, x_test, y_val, y_test = train_test_split(
x_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio))
model_1 = LinearRegression()
model_2 = xgb.XGBRegressor()
model_3 = RandomForestRegressor()
model_1.fit(x_train, y_train)
val_pred_1 = model_1.predict(x_val)
test_pred_1 = model_1.predict(x_test)
val_pred_1 = pd.DataFrame(val_pred_1)
                                                77
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
test_pred_1 = pd.DataFrame(test_pred_1)
model_2.fit(x_train, y_train)
val_pred_2 = model_2.predict(x_val)
test_pred_2 = model_2.predict(x_test)
val_pred_2 = pd.DataFrame(val_pred_2)
test_pred_2 = pd.DataFrame(test_pred_2)
model_3.fit(x_train, y_train)
val_pred_3 = model_1.predict(x_val)
test_pred_3 = model_1.predict(x_test)
val_pred_3 = pd.DataFrame(val_pred_3)
test_pred_3 = pd.DataFrame(test_pred_3)
df_val = pd.concat([x_val, val_pred_1, val_pred_2, val_pred_3], axis=1)
df_test = pd.concat([x_test, test_pred_1, test_pred_2, test_pred_3], axis=1)
final_model = LinearRegression()
final_model.fit(df_val, y_val)
final_pred = final_model.predict(df_test)
print(mean_squared_error(y_test, pred_final))
Output:
4790
Result:
Thus the program to implement ensembling technique of Blending with the given Alcohol QCM Dataset
have been executed successfully and the output got verfied.
VIVA Questions:
                                              78
                                                                     DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                  DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
   1. Write a python program to implement ensemble techniques, such as voting and bagging, using
      scikit-learn.
   2. Write a python program to implement clustering algorithms, specifically K-means and
      DBSCAN, using scikit-learn.
   3. Write a python program to implement the EM algorithm for Bayesian networks in Python.
                                            79
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
80
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
81
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement K-Nearest Neighbor algorithm to classify the Iris Dataset
Algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
Step-6: Our model is ready.
Program:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import pandas as pd
import numpy as np
from sklearn import datasets
iris=datasets.load_iris()
iris_data=iris.data
iris_labels=iris.target
print("accuracy is")
print(classification_report(y_test, y_pred))
Output:
                                               82
                                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
accuracy is
Result:
Thus the program to implement k Nearest Neighbour Algorithm for clustering Iris dataset have been executed
successfully and output got verified.
Viva Questions:
1.Which of the following is a goal of clustering algorithms?
    🗸 Classification
    🗸 Regression
    🗸 Dimensionality reduction
    🗸 Grouping similar data points together
2. Which clustering algorithm is based on the concept of centroids?
    🗸 K-Means
    🗸 DBSCAN
    🗸 Agglomerative
    🗸 Mean-Shift
3.Which of the following is finally produced by Hierarchical Clustering?
    🗸 final estimate of cluster centroids
    🗸 tree showing how close things are to each other
    🗸 assignment of each point to clusters
    🗸 all of the mentioned
4. Which of the following clustering requires merging approach?
     🗸 Partitional
    🗸 Hierarchical
    🗸 Naive Bayes
    🗸 None of the mentioned
5.Which of the following is true for clustering
   🗸 Clustering is a technique used to group similar objects into clusters.
   🗸 partition data into groups
   🗸 dividing entire data, based on patterns in data
                                                  83
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
   🗸 All of the above
Practice Exercise :
1.Implement an application that predict the segmentation and classify the customer requirement using the
  clustering algorithms.
2. Write a program to implement k-means clustering algorithm
                                              84
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
85
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
86
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the EM algorithm for clustering networks using the given dataset.
Algorithm:
Step 1 :Initialize θ randomly Repeat until convergence:
Step 2:E-step: Compute q(h) = P(H = h | E = e; θ) for each h (probabilistic inference)
Step 3:Create fully -observed weighted examples: (h, e) with weight q(h)
Step 4:M-step: Maximum likelihood (count and normalize) on weighted examples to get θ
Program:
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
                                               87
                                                                          DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                       DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X) predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:
                                             88
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement EM Algorithm for clustering networks using the given dataset have been executed
successfully and the output got verified.
Viva Questions:
                                                 89
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise :
1. Write a python program to EM algorithm to learn parameters for a Bayesian network using the pgmpy library
2. Write a EM code for understand the heart diseases and implement using the Bayesian network
                                             90
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
91
                                                                              DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                           DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the neural network model for the given dataset.
Algorithm:
Step-1 Image Acquisition: The first step is to acquire images of paper documents with the help of
optical scanners. This way, an original image can be captured and stored.
Step-2: Pre-processing: The noise level on an image should be optimized and areas outside the text removed.
Pre-processing is especially vital for recognizing handwritten documents that are more sensitive to noise.
Step-3: Segmentation: The process of segmentation is aimed at grouping characters into meaningful chunks.
There can be predefined classes for characters. So, images can be scanned for patterns that match the classes.
Step-4: Feature Extraction: This step means splitting the input data into a set of features, that is, to find
essential characteristics that make one or another pattern recognizable.
Step-6: Post processing: This stage is the process of refinement as an OCR model can require some corrections.
However, it isn’t possible to achieve 100% recognition accuracy. The identification of characters heavily
depends on the context.
Program:
from __future__ import print_function
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.optimizers import RMSprop, SGD
from keras.optimizers import Adam
from keras.utils import np_utils
from emnist import list_datasets
from emnist import extract_training_samples
from emnist import extract_test_samples
                                                  92
                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                        DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
np.random.seed(1671) # for reproducibility
# network and training
NB_EPOCH = 30
BATCH_SIZE = 256
VERBOSE = 2
NB_CLASSES = 256 # number of outputs = number of classes
OPTIMIZER = Adam()
N_HIDDEN = 512
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
DROPOUT = 0.20
print(list_datasets())
X_train, y_train = extract_training_samples('byclass')
print("train shape: ", X_train.shape)
print("train labels: ",y_train.shape)
X_test, y_test = extract_test_samples('byclass')
print("test shape: ",X_test.shape)
print("test labels: ",y_test.shape)
#for indexing from 0
y_train = y_train-1
y_test = y_test-1
RESHAPED = len(X_train[0])*len(X_train[1])
X_train = X_train.reshape(len(X_train), RESHAPED)
X_test = X_test.reshape(len(X_test), RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# normalize
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)
# M_HIDDEN hidden layers
# 35 outputs
# final stage is softmax
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
                                     93
                                                             DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                          DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=OPTIMIZER,
metrics=['accuracy'])
history = model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Output:
['balanced', 'byclass', 'bymerge', 'digits', 'letters', 'mnist']
train shape: (697932, 28, 28)
train labels: (697932,)
test shape: (116323, 28, 28)
test labels: (116323,)
697932 train samples
116323 test samples
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
activation (Activation) (None, 512) 0
dropout (Dropout) (None, 512) 0
dense_1 (Dense) (None, 256) 131328
activation_1 (Activation) (None, 256) 0
dropout_1 (Dropout) (None, 256) 0
dense_2 (Dense) (None, 256) 65792
activation_2 (Activation) (None, 256) 0
dropout_2 (Dropout) (None, 256) 0
                                         94
                                                  DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                               DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
dense_3 (Dense) (None, 256) 65792
activation_3 (Activation) (None, 256) 0
dropout_3 (Dropout) (None, 256) 0
dense_4 (Dense) (None, 256) 65792
activation_4 (Activation) (None, 256) 0
Total params: 730,624
Trainable params: 730,624
Non-trainable params: 0
                                          95
                                                                           DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                      DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement the neural network model for the given dataset.
Viva Questions:
1. Why do we need biological neural networks?
        🗸 To make smart human interactive & user friendly system
        🗸 To apply heuristic search methods to find solutions of problem
        🗸 To solve tasks like machine vision & natural language processing
        🗸 All of the above
2. Artificial neural network is used for
        🗸 Classification
        🗸 Clustering
        🗸 Pattern recognition
        🗸 All of the above
3. . Artificial Neural Network is based on which approach?
        🗸 Weak Artificial Intelligence approach
        🗸 Cognitive Artificial Intelligence approach
        🗸 Strong Artificial Intelligence approach
        🗸 Applied Artificial Intelligence approach
4. A Neural Network can answer
        🗸 For Loop questions
        🗸 what-if questions
        🗸 IF-The-Else Analysis Questions
        🗸 None of the mentioned
5. The first neural network computer:
        🗸 AM
        🗸 AN
        🗸 RFD
        🗸 SNARC
Practice Exercise :
1. Develop a neural network model to optimize the pattern for the information.
2. Write a code to find the shortest path to scale data for long short-term memory network in python
                                                     96
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
97
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
98
                                                                      DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                   DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement and build a Convolutional neural network model which predicts the age and gender of a
person using the given pre-trained models.
Algorithm:
Steps in CNN Algorithm:
Step-1: Choose the Dataset.
Step-2: Prepare the Dataset for training.
Step-3: Create training Data.
Step-4: Shuffle the Dataset.
Step-5: Assigning Labels and Features.
Step-6: Normalising X and converting labels to categorical data.
Step-7: Split X and Y for use in CNN.
Step-8: Define, compile and train the CNN Model.
Step-9: Accuracy and Score of the model.
Program:
import cv2 as cv
import math
import time
from google.colab.patches import cv2_imshow
def getFaceBox(net, frame, conf_threshold=0.7):
 frameOpencvDnn = frame.copy()
 frameHeight = frameOpencvDnn.shape[0]
 frameWidth = frameOpencvDnn.shape[1]
 blob = cv.dnn.blobFromImage(frameOpencvDnn, 1.0, (300, 300), [104, 117, 123], True,
False) net.setInput(blob)
 detections = net.forward()
 bboxes = []
 for i in range(detections.shape[2]):
 confidence = detections[0, 0, i, 2]
 if confidence > conf_threshold:
 x1 = int(detections[0, 0, i, 3] * frameWidth)
 y1 = int(detections[0, 0, i, 4] * frameHeight)
 x2 = int(detections[0, 0, i, 5] * frameWidth)
 y2 = int(detections[0, 0, i, 6] * frameHeight)
 bboxes.append([x1, y1, x2, y2])
 cv.rectangle(frameOpencvDnn, (x1, y1), (x2, y2), (0, 255, 0),
int(round(frameHeight/150)), 8)
 return frameOpencvDnn, bboxes
                                               99
                                                                        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
faceProto = "/content/opencv_face_detector.pbtxt"
faceModel = "/content/opencv_face_detector_uint8.pb"
ageProto = "/content/age_deploy.prototxt"
ageModel = "/content/age_net.caffemodel"
genderProto = "/content/gender_deploy.prototxt"
genderModel = "/content/gender_net.caffemodel"
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']
genderList = ['Male', 'Female']
ageNet = cv.dnn.readNet(ageModel, ageProto)
genderNet = cv.dnn.readNet(genderModel, genderProto)
faceNet = cv.dnn.readNet(faceModel, faceProto)
def age_gender_detector(frame):
# Read frame
t = time.time()
frameFace, bboxes = getFaceBox(faceNet, frame)
for bbox in bboxes:
# print(bbox)
face = frame[max(0,bbox[1]-padding):min(bbox[3]+padding,frame.shape[0]-
1),max(0,bbox[0]-padding):min(bbox[2]+padding, frame.shape[1]-1)]blob =
cv.dnn.blobFromImage(face, 1.0, (227, 227), MODEL_MEAN_VALUES, swapRB=False)
genderNet.setInput(blob)
genderPreds = genderNet.forward()
gender = genderList[genderPreds[0].argmax()]
# print("Gender Output : {}".format(genderPreds))
print("Gender : {}, conf = {:.3f}".format(gender,
genderPreds[0].max()))ageNet.setInput(blob)
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print("Age Output : {}".format(agePreds))
print("Age : {}, conf = {:.3f}".format(age, agePreds[0].max()))label = "{},{}".format(gender,
age)
cv.putText(frameFace, label, (bbox[0], bbox[1]-10), cv.FONT_HERSHEY_SIMPLEX, 0.8, (0,
255, 255), 2, cv.LINE_AA)
 return frameFace
from google.colab import files
uploaded = files.upload()
input = cv.imread("2.jpg")
output = age_gender_detector(input)
cv2_imshow(output)
Output:
                                              10
                                              0
                                                                       DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement and build a Convolutional neural network model which predicts the age and
gender of a person using the given pre-trained models have been executed successfully and the output got
verified
Viva Questions:
1. ________ computes the output volume by computing dot product between all filters and image patch
    🗸 Input Layer
    🗸 Convolution Layer
    🗸 Pool Layer
    🗸 Activation Function Layer
2. _____ is/are the ways to represent uncertainty
    🗸 Fuzzy logic
    🗸 Entropy
    🗸 Probability
    🗸 All of the above
                                               10
                                               1
                                                                         DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
                                                                      DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise :
1.Write a python program to building a deep neural network model using python and the keras
  library(multi-layer perception(MLP) model for multi-class classification).
                                              10
                                              2
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
10
3
        DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
     DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
10
4