CLASS-XII
2021-2022
        By
KRISHAN MEENA
    (PGT-CS)
JNV, PUDUCHERRY
DATA VISUALIZATION
USING MATPLOTLIB
   Datas are the elements that has no meaning of its own unless and until they are
    arranged and processed together to give some meaningful information.
   Visualization is the way to analyze and absorb information. It is the first step for any
    kind of data analysis work.
   Thus, Visualization or better called Data Visualization help us to easily understand a
    complex problem and see certain patterns.
   It also help in identifying patterns, relationships and outliers in data and
    inunderstanding business problems better and quickly.
   Data Visualization basically refers to the graphical or visual representation of
    information and data using visual elements like charts, graphs, maps etc.
   PLOTTING REFERS TO DRAWING A PICTORIAL GRAPH USING VARIOUS
    COMPONENTS SUCH AS AXIS, LABELS, LEGENDS, TITLE, STYLE, COLOR ETC ON A
    COMPUTER SYSTEM WITH THE HELP OF SOFTWARE SUCH AS MATPLOTLIB , CAD
    ETC.
   PURPOSE OF PLOTTING IS TO :-
     1) CREATE 2-D /3-D GRAPHS
     2) DISPLAY THE DISTRIBUTION OF DATA AT PARTICULAR POINT
     3) DISPLAY THE RISE AND FALL OF RESULT/DATAS
     4) ATTRACTING THE PEOPLE ATTENTION
    5) HELPS IN ANALYZING THE DATAS EASILY.
1.   Figure:- A whole figure may contain one or more than one axes.
2.   Axes:- This is what we think of as a plot. A figure can contain many Axes. Each axis has a
     title , an x-label and y-label.
3.   Artist:- Everything which one can see on the figure is an artist like Text objects, Line2D
     objects, collection objects. Most Artists are tied to the Axes.
4    Labels:- To manage the axes dimensions of a plot, another important piece of
     information to add to a plot is the axes labels, since they usually specify what kind of
     data we are plotting.
5    Title:- Just like in a book or paper the title of a graph describes what it is . Matplotlib also
     provide a simple function, plt.title() to add a title to an image.
6.   Legend:- Legends are used to explain what each line means in the current figure.
   Matplotlib is one of the most popular Python packages used for data
    visualization. It is a cross-platform library for making 2D plots from data in arrays.
    It provides an object-oriented API that helps in embedding plots in applications
    using Python GUI toolkits such as PyQt, WxPythonotTkinter. It can be used in
    Python and IPython shells, Jupyter notebook and web application servers also.
   Matplotlib is written in Python and makes use of NumPy, the numerical
    mathematics extension of Python.
   Matplotlib was originally written by John D. Hunter in 2003. The current stable
    version is 2.2.0 released in January 2018.
   Matplotlib and its dependency packages are available in the form of wheel
    packages on the standard Python package repositories and can be installed
    on Windows, Linux as well as MacOS systems using the pip package manager.
   pip3 install matplotlib
   Anaconda is a free and open source distribution of the Python and R programming
    languages for large-scale data processing, predictive analytics, and scientific
    computing. The distribution makes package management and deployment simple
    and easy. Matplotlib and lots of other useful (data) science tools form part of the
    distribution. Package versions are managed by the package management system
    Conda. The advantage of Anaconda is that you have access to over 720
    packages that can easily be installed with Anaconda's Conda, a package,
    dependency, and environment manager.
   Anaconda distribution is available for installation
    at https://www.anaconda.com/download/. For installation on Windows, 32 and 64
    bit binaries are available −
   https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe
   https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe
   Jupyter is a loose acronym meaning Julia, Python, and R. These
    programming languages were the first target languages of the
    Jupyter application, but nowadays, the notebook technology also
    supports many other languages.
   To start the Jupyter notebook, open Anaconda navigator (a desktop
    graphical user interface included in Anaconda that allows you to
    launch applications and easily manage Conda packages,
    environments and channels without the need to use command line
    commands).
Sr.No      Function & Description
 1                       Bar
                  Make a bar plot.
 2                       Hist
                  Plot a histogram.
 3                     Boxplot
           Make a box and whisker plot.
 4                       Pie
                  Plot a pie chart.
 5                     Scatter
            Make a scatter plot of x vs y.
 6                       Plot
        Plot lines and/or markers to the Axes.
   Line Plot is a type of plot which displays information as a series of Data
    Points called “markers” connected by straight lines. In this type of plot
    we need the measurement points to be ordered by their X-axis values.
   A Line Chart is represented by a series of data points connected by a
    straight line and using plot() function available in pyplot library.
   Example:-
                import matplotlib.pyplot as plt
                plt.plot([1,2,3], [5,7,4])
                plt.show()
   To begin with, the Pyplot module from Matplotlib package is imported, with an alias
    plt as a matter of convention.
   Example:-
                import matplotlib.pyplot as plt
                import numpy as np, import math
                x = np.arange(0, math.pi*2, 0.05)
                y = np.sin(x)
                plt.plot(x,y)
                You can set the plot title, and labels for x and y axes.
                plt.xlabel("angle")
                plt.ylabel("sine")
                plt.title('sine wave')
                plt.show()
   A bar chart or bar graph is a chart or graph that presents
    categorical data with rectangular bars with heights or lengths
    proportional to the values that they represent. The bars can be
    plotted vertically or horizontally.
   A bar graph shows comparisons among discrete categories. One
    axis of the chart shows the specific categories being compared,
    and the other axis represents a measured value.
   Matplotlib API provides the bar() function that can be used in the
    MATLAB style use as well as object oriented API.
   import matplotlib.pyplot as plt
   fig = plt.figure()
   ax = fig.add_axes([0,0,1,1])
   langs = ['C', 'C++', 'Java', 'Python', 'PHP']
   students = [23,17,35,29,12]
   ax.bar(langs,students)
   plt.show()
   A histogram is an accurate representation of the distribution of
    numerical data. It is an estimate of the probability distribution of a
    continuous variable. It is a kind of bar graph.
    To construct a histogram, follow these steps −
   Bin the range of values.
   Divide the entire range of values into a series of intervals.
   Count how many values fall into each interval.
   The bins are usually specified as consecutive, non-overlapping
    intervals of a variable.
   The matplotlib.pyplot.hist() function plots a histogram. It computes
    and draws the histogram of x.
   from matplotlib import pyplot as plt
   import numpy as np
   fig,ax = plt.subplots(1,1)
   a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
   ax.hist(a, bins = [0,25,50,75,100])
   ax.set_title("histogram of result")
   ax.set_xticks([0,25,50,75,100])
   ax.set_xlabel('marks')
   ax.set_ylabel('no. of students')
   plt.show()
   A Pie Chart can only display one series of data. Pie charts show the
    size of items (called wedge) in one data series, proportional to the
    sum of the items. The data points in a pie chart are shown as a
    percentage of the whole pie.
   Matplotlib API has a pie() function that generates a pie diagram
    representing data in an array. The fractional area of each wedge is
    given by x/sum(x). If sum(x)< 1, then the values of x give the fractional
    area directly and the array will not be normalized. The resulting pie will
    have an empty wedge of size 1 - sum(x).
   The pie chart looks best if the figure and axes are square, or the Axes
    aspect is equal.
   from matplotlib import pyplot as plt
   import numpy as np
   fig = plt.figure()
   ax = fig.add_axes([0,0,1,1])
   ax.axis('equal')
   langs = ['C', 'C++', 'Java', 'Python', 'PHP']
   students = [23,17,35,29,12]
   ax.pie(students, labels = langs,autopct='%1.2f%%')
   plt.show()
-
   Scatter plots are used to plot data points on
    horizontal and vertical axis in the attempt to
    show how much one variable is affected by
    another. Each row in the data table is
    represented by a marker the position depends
    on its values in the columns set on the X and Y
    axes. A third variable can be set to correspond
    to the color or size of the markers, thus adding
    yet another dimension to the plot.
   import matplotlib.pyplot as plt
   girls_grades = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
   boys_grades = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
   grades_range = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
   fig=plt.figure()
   ax=fig.add_axes([0,0,1,1])
   ax.scatter(grades_range, girls_grades, color='r')
   ax.scatter(grades_range, boys_grades, color='b')
   ax.set_xlabel('Grades Range')
   ax.set_ylabel('Grades Scored')
   ax.set_title('scatter plot')
   plt.show()
   A box plot which is also known as a whisker plot displays a
    summary of a set of data containing the minimum, first quartile,
    median, third quartile, and maximum. In a box plot, we draw a
    box from the first quartile to the third quartile. A vertical line goes
    through the box at the median. The whiskers go from each
    quartile to the minimum or maximum.
the fake data. It takes three arguments, mean and
standard deviation of the normal distribution, and the
number of values desired.
   np.random.seed(10)
   collectn_1 = np.random.normal(100, 10, 200)
   collectn_2 = np.random.normal(80, 30, 200)
   collectn_3 = np.random.normal(90, 20, 200)
   collectn_4 = np.random.normal(70, 25, 200)
   fig = plt.figure()
   # Create an axes instance
   ax = fig.add_axes([0,0,1,1])
   # Create the boxplot
   bp = ax.boxplot(data_to_plot)
   plt.show()
   Frequency polygons are a graphical device for understanding the
    shapes of distributions. They serve the same purpose as histograms,
    but are especially helpful for comparing sets of data. Frequency
    polygons are also a good choice for displaying          cumulative
    frequency distributions.
   In a frequency polygon , the number of observations is marked with
    a single point at the midpoint of an interval. A straight line then
    connects each set of points. Frequency polygons make it easy to
    compare two or more distribution on the same set of axes.
                      using        Matplotlib
       import numpy as np
   plt.hist(data_bins=[0,10,20,30,40,50,60],   weights=[20,10,45,33,6,8],
         import matplotlib.pyplot
         edgecolor=“red”,         as plt
                                histtype=“step”)
          data=[5,15,25,35,15,55]
plt.xlabel(‘Value’)
 plt.ylabel(‘Probability’)
 plt.title(‘Histogram’)
plt.show()
            The Components of a Histogram plot constitute:-
1) Title :-To display the headings of the Histogram.
2)Colour:- To show the colour of the bar.
3)Axis:- X-axis and Y-axis
4) Data:- The data can be represented as an array.
5) Height and Width of bar:- This is determined based on the analysis . The width of the
bar
                             is called bin or interval.
6) Border Colour:-To display border colour of the bar.
   KRISHAN MEENA
   JNV PUDUCHERRY