KEMBAR78
2-Presentation of Data | PDF | Cartesian Coordinate System | Pie Chart
0% found this document useful (0 votes)
12 views35 pages

2-Presentation of Data

Uploaded by

shahn904000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views35 pages

2-Presentation of Data

Uploaded by

shahn904000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Presentation of data

Introduction.
The device of gathering data often results in a massive volume of statistical
data, which are in the form of individual measurements or counts. It is difficult to
learn anything by examining the unorganized data which is more often confusing
than clarifying. The mass of data is therefore to be organized and condensed into a
form that can be more rapidly and easily understood and interpreted. For this
purpose, techniques of classification, tabulation and displays are presented.
• Classification
The term classification is defined as the process of dividing a set of
observations or objects into classes or groups in such a way that
i. Observations or objects in the same class or group are similar
ii. Observations or objects in each class or group are dissimilar to observations or
objects in the other class or group.
• Classification is thus sorting of data into homogeneous classes or groups
according to their being alike or not.
• When the data are sorted according to one criterion only, it is called simple
classification or one way classification
• Classification is called two way classification when the data are sorted according
to two criteria.
• A manifold classification or cross- classification is made according to several
criteria.
• Data may also be classified according to qualitative, temporal and geographical
characteristic.
• Distribution
Arrangement of data according to the values of variable characteristic is
called a distribution.
• When the defining variable variable is expressed in terms of location, we get a
spatial or geometrical distribution. Temporal arrangement of values is referred as
a time series
• Aims of classification.
The main aims of classification are:
i. To reduce the large sets of data to an easily understood summary.
ii. To display the points of similarity and dissimilarity.
iii. To save mental strain by eliminating unnecessary details.
iv. To reflect the important aspects of data; and
v. To prepare the ground for comparison and inference.
• Basic Principles of Classification.
While classifying large sets of data, the following points should be taken into
consideration.
i. The classes or categories into which the data are to be divided, should be
mutually exclusive and no overlap should exist between successive classes. In
other words, classes should be arranged so that each observation or object can
be placed in one and only one class.
ii. The classes or categories should be all inclusive. All inclusive classes are classes that include
all the data.
iii. As far as possible, the conventional classification procedure should be adopted.
iv. The classification procedure should not be so elaborate as to lead to trivial classes nor it
should be so crude as to concentrate all the data in one or two classes.
• GRAPHICAL REPRESENTATION
• Tabulation, is a good method of condensing and representing statistical data in a
readily understandable form, but many people have no taste for figures. They
would prefer a way of representation where figures could be avoided. This
purpose is achieved by the presentation of statistical data in a visual form. The
visual display of statistical data in the form of points, lines, areas and other
geometrical forms and symbols, is in the most general terms known as Graphical
Representation statistical data can be studied with this method without going
through figures, presented in the form of tables. Such visual representation can
be divided into two main groups, graphs and diagrams to be described in the
sections that follow. The basic difference between a graph and a diagram is that a
graph is a representation of data by a continuous curve, usually shown on a graph
paper while a diagram is any other one, two or three--dimensional form of visual
representation.
• DIAGRAMS
Diagrammatic representation is best suited to spatial series and data split
into different categories. Whenever a comparison of the same type of data at
different places is to be made, diagrams will be the best way to do that.
Diagrammatic representation has several advantages over tabular representation
of figures. Beautifully and neatly constructed diagrams are more attractive than
simple figures. Diagrams, being a visual display, leave more effective and long
lasting impression on the mind of a reader. They make unwieldy data intelligible at
a glance. Comparison is made easier with diagrams. Diagrams have some
disadvantages too. Diagrams are less accurate than tables; cost money and time
and the amount of information conveyed is limited. However, this method of
representation is excessively used in business and administration.
• Different types of diagrams or charts commonly used for displaying statistical
data are described
• Linear or One-Dimensional Diagrams. They consist of Simple Bars, Multiple Bars
and Component Bar charts. Here the values are represented only by one
dimension, generally the length of the bar.
• Areal or Two-Dimensional Diagrams. They consist of Rectangles, Sub-divided
Rectangles and Squares, the areas of which are proportional to the values of the
given quantities. This device is used to represent data having moderately large
variations.
• Cubic or Three-Dimensional Diagrams. They are in the form of Cubes and
cylinders, whose volumes are proportional to the values they represent. These
diagrams are used when the variation among the values of the data to be
portrayed is so large that even the square roots of the values concerned fail to
reduce the variation appreciably.
• Pie-Diagrams. They are in the form of Circles and Sectors. Here the areas of
circles or sectors are in proportion to the values they represent or compare.
• Pictograms. They consist of pictures or small symbolic figures representing the
statistical data. A pictogram is an effective way of visual comparisons. For
example, we can compare the armed strength of various countries by drawing
pictures of the number of soldiers, where each pictorial soldier may denote, say,
1,000 soldiers. In a similar way, the production of wheat can be compared by
means of the pictures of wheat bags of a specified size. It is essential to repeat
the pictures a number of times to represent the differences in magnitudes.
• While drawing diagrams, the following points should be kept in mind:
i) An appropriate scale consistent with the size of paper available and
the size of the data to be represented, should be chosen and
indicated either at the side or at the bottom of the diagram. This
scale must start at zero.
ii) ii) A diagram like a table, must have a title, which should be brief
and self-explanatory. A key, footnote or source will also be necessary.
iii) A diagram should be shaded, colored or cross-hatched to show the
different parts, if any.
iv) Lettering should be shown horizontally.
Categories of data TYPES OF DATA
There are two broad categories of data
… qualitative data and Qualitative Quantitative

quantitative data. A variety


of methods exist for Univariate Bivariate Discrete Continuous
Frequency Frequency
summarizing and
Table Table Frequency Frequency
describing these two Distribution Distribution

types of data. Percentages


Component Multiple
Line Histogram
The given tree-diagram Pie Chart Bar Chart Bar
Chart
Chart
presents an outline Frequency
Polygon
of the various Bar Chart

techniques. Frequency
Curve
First, we will be dealing with various techniques for summarizing and describing
qualitative data.

Qualitative

Univariate Bivariate
Frequency Frequency
Table Table

Percentage
Component Multiple
Bar Chart Bar Chart
Pie Chart

Bar Chart

We will begin with the univariate situation, and will proceed to the bivariate
situation.
• EXAMPLE
Suppose that we are carrying out a survey of the students of first year studying in a
co-educational college of Lahore. Suppose that in all there are 1200 students of
first year in this large college. We wish to determine what proportion of these
students have come from Urdu medium schools and what proportion has come
from English medium schools. So we will interview the students and we will inquire
from each one of them about their schooling. As a result, we will obtain a set of
data as you can now see on the screen.
We will have an array of observations as follows:
U, U, E, U, E, E, E, U, ……
(U : URDU MEDIUM)
(E : ENGLISH MEDIUM)
Now, the question is what should we do with this data?
Obviously, the first thing that comes to mind is to count the number of students
who said “Urdu medium” as well as the number of students who said “English
medium”. This will result in the following table:
Medium of No. of Students
Institution (f)
Urdu 719
English 481
Total 1200

The technical term for the numbers given in the second column of this table
is “frequency”. It means “how frequently something happens?” Out of the
1200 students, 719 stated that they had come from Urdu medium schools. So
in this example, the frequency of the first category of responses is 719
whereas the frequency of the second category of responses is 481.

It is evident that this information is not as useful as if we compute the


proportion or percentage of students falling in each category. Dividing the cell
frequencies by the total frequency and multiplying by 100 we obtain the following:
Medium of
Institution f %

Urdu 719 59.9 = 60%


English 481 40.1 = 40%
Total 1200
• What we have just accomplished is an example of a univariate frequency table pertaining
to qualitative data.
• Let us now see how we can represent this information in the form of a diagram.
• One good way of representing the above information is in the form of a pie chart.
• A pie chart consists of a circle which is divided into two or more parts in accordance with
the number of distinct categories that we have in our data.
• For the example that we have just considered, the circle is divided into two sectors, the
larger sector pertaining to students coming from Urdu medium schools and the smaller
sector pertaining to students coming from English medium schools.
• How do we decide where to cut the circle?
• The answer is very simple! All we have to do is to divide the cell frequency
by the total frequency and multiply by 360. This process will give us the
exact value of the angle at which we should cut the circle.
PIE CHART

Urdu
𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑝𝑎𝑟𝑡
Angle = × 360° 215.7°
𝑤ℎ𝑜𝑙𝑒 𝑝𝑎𝑟

Medium of English
f Angle
Institution 144.3°
Urdu 719 215.70
English 481 144.30
1200
SIMPLE BAR CHART:
The next diagram to be considered is the simple bar chart.
A simple bar chart consists of horizontal or vertical bars of equal width and
lengths proportional to values they represent.
As the basis of comparison is one-dimensional, the widths of these bars have no
mathematical significance but are taken in order to make the chart look attractive.
Let us consider an example.
Suppose we have available to us information regarding the turnover of a company
for 5 years as given in the table below:
Years 1965 1966 1967 1968 1969

Turnover 35,000 42,000 43,500 48,000 48,500


(Rupees)
To represent the above information in the form of a bar chart, all we
must do is to take the year along the x-axis and construct a scale for
turnover along the y-axis.

50,000

40,000

30,000

20,000

10,000

0
1965 1966 1967 1968 1969

Next, against each year, we will draw vertical bars of equal width and different
heights in accordance with the turn-over figures that we have in our table.
As a result we obtain a simple and attractive diagram as shown below.
When our values do not relate to time, they should be arranged in ascending
or descending order before-charting.

50,000

40,000

30,000

20,000

10,000

0
1965 1966 1967 1968 1969
• Bivariate situation.
Going back to the example of the first year students, suppose that along with the
enquiry about the Medium of Institution, you are also recording the gender of the
student. Suppose that our survey results in the following information:
Student No. Medium Gender
1 U F
2 U M
3 E M
4 U F
5 E M
6 E F
7 U M
8 E M
: : :
: : :
Now this is a bivariate situation; we have two variables, medium of schooling and
gender of the student.
In order to summarize the above information, we will construct a table containing
a box head and a stub as shown below:

Sex
Male Female Total
Med.

Urdu

English

Total
The top row of this kind of a table is known as the box-head and the first column of
the table is known as stub. Next, we will count the number of students falling in
each of the following four categories:
• Male student coming from
Gender
an Urdu medium school. Male Female Total
Med.
• Female student coming from
an Urdu medium school.
• Male student coming from Urdu 202 517 719
an English medium school.
• Female student coming from
English 350 131 481
an English medium school.
As a result, suppose we
obtain the given figures: Total 552 648 1200
• COMPONENT BAR CHAR:
A component bar chart is an effective technique in which each bar is divided into
two or more sections, proportional in size to the component parts of a total being
displayed by each bar. The various component parts shown as sections of the bar,
are shaded or colored differently to increase the overall Urdu
800 English
effectiveness of the diagram.
700
Component bar charts are used to
600
represent the cumulation of the
500
various components of data and
400
the percentages. They are also 300
known as sub-divided bars. 200
as shown in the given figure. 100
0
Male Female
In the above figure, each bar has been divided into two parts. The
first bar represents the total number of male students whereas the
second bar represents the total number of female students.
As far as the medium of schooling is concerned, the lower part of
each bar represents the students coming from English medium schools.
Whereas the upper part of each bar represents the students coming
from the Urdu medium schools. The advantage of this kind of a diagram
is that we are able to ascertain the situation of both the variables at a
glance.
We can compare the number of male students in the college with
the number of female students, and at the same time we can compare
the number of English medium students among the males with the
number of English medium students among the females.
• MULTIPLE BAR CHARTS
The next diagram to be considered is the multiple bar charts. Let us consider an
example.
Suppose we have information regarding the imports and exports of Pakistan
for the years 1970-71 to 1974-75 as shown in the table below:
Imports Exports
Years (Crores of Rs.) (Crores of Rs.)

1970-71 370 200


1971-72 350 337
1972-73 840 855
1973-74 1438 1016
1974-75 2092 1029
Source: State Bank of Pakistan
A multiple bar chart shows two or more characteristics corresponding to the
values of a common variable in the form of grouped bars, the lengths of which are
proportional to the values of characteristics, and each of which is shaded or
colored differently in order to aid identification. With reference to the above
example, we obtain the multiple bar chart shown below:
• Imports & Exports of Pakistan 1970-71 to 1974-75
This is a very good device for the comparison of two different kinds of
information 2500
2000
1500
1000
500
0
The question is, what is the basic difference between a component bar
chart and a multiple bar chart?
The component bar chart should be used when we have available to us
information regarding totals and their components.

For example, the total number of male students out of which some are
Urdu medium and some are English medium. The number of Urdu
medium male students and the number of English medium male
students add up to give us the total number of male students.
On the contrary, in the example of exports and imports, the imports and
exports do not add up to give us the totality of some one thing!
• FREQUENCY DISTRIBUTION
• The organization of a set of data in a table showing the distribution of the data
into classes or groups together with the number of observations in each class or group
is called a Frequency Distribution.
• The number of observations falling in a particular class is referred to as the class
frequency or simply frequency and is denoted by f.
• Data presented in the form of a frequency distribution are also called grouped data
while the data in the original form are referred to as ungrouped data.
• The purpose of a frequency distribution is to produce a meaningful pattern for the
overall distribution of the data from which conclusions can be drawn.
• A fairly common frequency pattern is the rising to a peak and then declining. In terms of
its construction, each class or group has lower and upper limits, lower and upper
boundaries, an interval and a middle value.
• Class-limits.
• The class-limits are defined as the numbers or the values of the variables which
describe the classes; the smaller number is the lower class limit and the larger number
is the upper class limit. Class-limits should be well defined and there should be no
overlapping.
In other words, the limits should be inclusive, i.e. the values corresponding
exactly to the lower limit or the upper limit be included in that class. The class-
limits are therefore selected in such a way that they have the same number of
significant places as the recorded values. Suppose the data are recorded to the
nearest integers. Then an appropriate method for defining the class limits without
overlapping, for example, may be 10 — 14, 15 — 19, 20 — 24, etc. The class limits
may be defined as 10.0 — 14.9, 15.0 — 19.9, 20.0 — 24.9, etc. when the data are
recorded to nearest tenth of an integer. Sometimes, a class has either no lower
class limit or no upper class-limit. Such a class is called an open-end class. The
open-end classes, if possible, should be avoided as they are.a hindrance in
performing certain calculations. A class indicated as 10 — 15 will include 10 but
not 15, i.e. 10≤X≤15, Class Lower Class Limit Upper Class Limit
Number
1 30.0 32.9
2 30.0 + 3 = 33.0 32.9 + 3 = 35.9
3 33.0 + 3 = 36.0 35.9 + 3 = 38.9
4 36.0 + 3 = 39.0 38.9 + 3 = 41.9
5 39.0 + 3 = 42.0 41.9 + 3 = 44.9
• Class-boundaries.
The class-boundaries are the precise numbers which separate one class from
another. The selection of these numbers removes the difficulty, if any, in knowing
the class to which a particular value should be assigned.
A class-boundary is located midway between the upper limit of a class and the
lower limit of the next higher class, e.g. 9.5 — 14.5, 14.5 — 19.5, 19.5 — 24.5, or
9.95 — 14.95, 14.95 — 19.95, etc. The class-boundaries are thus always defined
more precisely than the level of measurements being used so that the possibility of
any observation falling exactly on
Class Limit Class Boundaries Frequency
the boundary is avoided. That is
why the class boundaries carry 30.0 – 32.9 29.95 – 32.95 2
one more decimal place than the 33.0 – 35.9 32.95 – 35.95 4
36.0 – 38.9 35.95 – 38.95 14
class limits or the observed values.
39.0 – 41.9 38.95 – 41.95 8
The upper class boundary of a
42.0 – 44.9 41.95 – 44.95 2
class coincides with the lower
Total 30
boundary of the next class.
• Class Mark.
A class mark, also called class midpoint, is that number which divides each class
into two parts. In practice, it is obtained by dividing either the sum of the lower and
upper limits of a class, or the sum of the lower and upper boundaries of the class by 2 but
in a few cases, it does not hold, particularly in modern practice of age grouping. For
purposes of calculations, the frequency in a particular class is assumed to have the same
value as the class-mark or midpoint. This assumption may introduce an error, called
the grouping error, but statistical experience has shown
Class Mid-Point Frequency
that such errors usually tend to
Boundaries (X) (f)
counterbalance over the entire
26.95 – 29.95 28.45
distribution. The grouping error
29.95 – 32.95 31.45 2
may also be minimized by Selecting 32.95 – 35.95 34.45 4
a class (group) in such a way that 35.95 – 38.95 37.45 14
Its midpoint corresponds to the 38.95 – 41.95 40.45 8
mean of the observed values falling 41.95 – 44.95 43.45 2
in that class. 44.95 – 47.95 46.45
Class Width or Interval.
The class-width or interval of a class is equal to the difference between the
class boundaries. It may also be obtained by finding the difference either between
two successive lower class limits, or between two successive class marks. The
lower limit of a class should hot be subtracted from its upper limit to get the class
interval. An equal class interval, usually denoted by h or c, facilitates the
calculations of statistical constants such as the mean, the standard deviation
moments, etc. That is why in practice, it is desirable to have equal class-intervals.
But in some types of economic and medical data, it is wise to use unequal class-
intervals on account of greater concentration of measurements in certain classes.
Such class intervals usually become uniform when logarithms of class marks are
taken. It should be noted that some people use the terms “class” and “class-
interval” interchangeable and the width of the class is referred to as the size or
length of the class-interval.
• Constructing a Grouped Frequency Distribution.
The following are some basic rules that should be kept in mind when
constructing a grouped frequency distribution:
i. Decide on the number of classes into which the data are to be grouped. There
are no hard and fast rules for deciding on the number of classes which actually
depends on the size of data. Statistical experience tells us that no less than 5
and no more than 20 classes are generally used. Use of too many classes will
defeat the purpose of condensation and too few will result in too much loss of
information. H.A. Sturges has proposed an empirical rule for determining the
number of classes into which a set of observations should be grouped. The rule
is k=1 +3.3 log N, where k denotes the number of classes and N is the total
number of observations. For example, if there are 100 observations, then by
applying Sturges’ rule, we should have k=1+3.3 (2.0000) =7.6, i.e. 8 classes
Thus eight classes are required but this rule is rarely used in practice.
ii. Determine the range of variation in the data, i.e. the difference between the
largest and the smallest values in the data.
iii. Divide the range of variation by the number of classes to determine the
approximate width or size of the equal class-interval. In case of fractional
results, the next higher whole number is usually taken as the size or width of
class-interval. If equal class-intervals are inconvenient or may be undesirable,
then classes of unequal size are used. But in practice, intervals that are
multiple of 5 or 10, are commonly used as people can understand them more
readily.
iv. Decide where to locate the class-limit of the lowest class and then the lower
class boundary. The lowest class usually starts with the smallest data value or a
number less than it. It is better if it is a multiple of class-interval. Find the
upper class boundary by adding the width of the class-interval to the lower
class-boundary and write down the upper class limits too. The open-end
classes, i.e. classes with the lowermost or uppermost class boundary unknown,
should be avoided if possible.
v. Determine the remaining class-limits and class boundaries by adding the class-
interval repeatedly. The lowest class should be placed at the top and the rest
should follow according to size. In some cases, the highest class is placed at the
top.
vi. Finally, total the frequency column to see that all the data have been
accounted for.
• These rules are applied to group raw data which are assumed to be continuous.
In case of discrete data which carry only integral values, the concept of a class
boundary is unrealistic as there can be no points where the adjoining classes
meet. In spite of this logical difficulty, when the discrete data are sufficiently
large, they are treated for convenience of calculations as continuous and hence
are grouped in the same way as the continuous data.
• GRAPHS.
As already stated, diagrams are useful for representing spatial series.
Diagrams fail when we want to represent a statistical series spread over a period of
time, or a frequency distribution or two related variables in visual form. For such
representations, graphs are employed. Graphs present the data in a simple, clear
and effective manner, facilitate comparison between two or more than two
statistical series, and help us in appreciating their significance readily. Another
advantage of graphs is that they provide an overall picture of a statistical series.
• Graphs are also sometimes used to make predictions and forecasts. Certain
partition values can also be located graphically. But graphs are less accurate as
they do not give minute details. Moreover, they cost considerable expenditure
and time.
• Construction of Graphs.
In the construction of a graph, the first step is to take a starting point, known
as the origin, in the left-hand bottom corner of the graph paper. Two straight lines
perpendicular to each other are drawn through the origin.
The horizontal line is called the X-axis or abscissa and the vertical line is labeled as
Y-axis or ordinate. The two lines together are known as co-ordinate axes. Some
Suitable scales are selected along X-axis and Y-axis. Independent variable is taken
along X-axis and dependent variable along Y-axis. Points are plotted and joined to
get the required graph. While constructing a graph, the following points should be
kept in mind:
i) A scale and the form of representation is to be selected in such a way that the
true impression of the data to be represented is given by the graph.
ii) Every graph must have a clear and comprehensive title at top. Where necessary,
sub-titles should be added. .
iii) The source of the data must be given. A key and footnotes should be provided
when necessary.
iv) The independent variable should always be placed on the horizontal axis.
v. The vertical scale should always begin with zero, otherwise the graph will give a
false impression. If, however, the first item of the data is quite large, a scale-
break should be shown between zero and next member.
vi) The horizontal axis does not have to begin with zero unless of course, the
independent variable or the lower limit of the first class interval is zero.
vii) The axes of the graph should be properly labelled. Labels should clearly state
both the variable and the units, e.g. “Distance” and “Kilometer”. “Sales” and
“Rupees”, etc.
viii. Curves if more than one, must be clearly distinguished either by different
colors or by differentiated lines (solid, dashed, dot-dashed).
ix. The graph should not be loaded with too many curves.
Graphs can be divided into two main categories, namely:
a) Graphs of Time-Series or Graphs of Historical Data, and
b) b) Graphs of Frequency Distributions. The important graphs of frequency
distributions are Histogram, Frequency Polygon, Frequency Curve and the
Cumulative Frequency Curve of Ogive.

You might also like