BASIC BOX-PLOT
Applied Statistics and Computing Lab Indian School of Business
Applied Statistics and Computing Lab
Learning goals
What are the components of a basic box-plot? How is a basic box-plot constructed? How to interpret it? What are its salient features? What are its limitations? How is it related to Histogram? What is the effect of translation on a box-plot?
2
Applied Statistics and Computing Lab
Where does a box-plot fit in?
Exploratory Data-analytic tool for continuous data Visual display of certain important summary statistics
Applied Statistics and Computing Lab
Why Box-plot?
Useful in Studying Location Spread Distribution Symmetry Tail behaviour Skewness Useful in comparison of different batches of Data, or a batch of data with factors Useful to study information of observations at the tails Easy to compute and draw, yet informative User-friendly
4
Applied Statistics and Computing Lab
Constructing a Basic Box-plot
Data set 1. Suppose we have data on a batch (variable) 90, 41, 22, 135, 15, 72, 50, 26, 105 Step 1: Arrange the data in the increasing order: 15, 22, 26, 41, 50, 72, 90, 105, 135 Step 2: Get the Five-point Summary, consisting of (i) the Minimum, (ii) First quartile ( ), (iii) Median, (iv) Third quartile ( ) and (v) the Maximum For the above data, the Five-point Summary is: Minimum= 15 First Quartile= 26 Median= 50 Third Quartile= 90 Maximum= 135
5
Applied Statistics and Computing Lab
Constructing a Basic Box-plot (contd.)
Step 3: Draw a box of length equal to ( ). For now, we can choose the width as per convenience. The lower and upper hinges of the box represent the first and third quartiles. (In this case, the width is the vertical distance and hinges are the right and the left extremes)
Applied Statistics and Computing Lab
Constructing a Basic Box-plot (contd.)
Step 4: From the middle of the lower hinge draw a line (parallel to the lines corresponding to the length of the box) up to the minimum. Similarly draw a line from the middle of the upper hinge (parallel to the lines corresponding to the length of the box) up to the maximum. These lines are called the whiskers. Step 5: Draw a line at the median parallel to the hinges, dividing the box into two parts.
Applied Statistics and Computing Lab
Horizontal and vertical Box-plots
The box-plot we saw is a horizontal box-plot (here the scale is on the horizontal axis) One can also have a vertical plot (where the scale is on the vertical axis) There is no specific advantage of one over the other, in general
Applied Statistics and Computing Lab
What can we get from a basic Boxplot?
We can obviously get the position of the location measure, spread and where the middle 50% of observations are concentrated, in a visual display After all, a picture is worth a thousand words What else can we get ?
Applied Statistics and Computing Lab
9
What can we get from a basic Boxplot? (contd.)
Applied Statistics and Computing Lab
10 Visuals from Aczel A., Sounderpandian J. Complete business statistics
Interpretation of the basic box-plot
Thus the distribution of the Dataset 1 is right skewed.
Applied Statistics and Computing Lab
11
Basic Box-plot: Features and limitations
Features
Basic box-plot displays the location (median), intervals required for the first, second, third and fourth quarters of the data It visually shows where the middle 50% of data is located It tells us whether the data is symmetric or left skewed or right skewed
Limitations
We cannot get modal information We cannot identify unusual observations It is hard to identify the tail behaviour
Applied Statistics and Computing Lab
12
Histogram and box-plot
In histogram, the width of the interval is fixed and the height of the vertical bar is proportional to the (relative) frequency in that interval In a box-plot the relative frequency is fixed at 25% and the intervals correspond to the first, second, third and fourth 25% of the relative frequencies. More precisely,
The left and right whiskers correspond to the first and fourth 25% The part of the box from the first quartile to the median corresponds to the second 25%, and The part of the box from the median to the third quartile corresponds to the third 25%
Applied Statistics and Computing Lab
13
Scores dataset
Comprises 50 students scores in their second semester exam of the second course in Quantitative methods We also have their GPA of first semester exam and scores in 3 minors of the subject
= 60 = 20 = 1,2,3 = ( 10)
Applied Statistics and Computing Lab
14
Histogram and box-plot (contd.)
Histogram of scores Box-plot of scores
Applied Statistics and Computing Lab
15
Histogram and box-plot (contd.)
Box-plot readily gives the information on the 5-point summary and the information about skewness. However, it is not possible to get information about the mode(s) Histogram readily give information about mode(s). But it takes some effort to extract information about the 5-point summary Thus the two plots complement each other!
Applied Statistics and Computing Lab
16
Effect of translation
Consider the box-plot of a variable X. Suppose we translate from X to Y = aX + b where a is positive, the features of the box-plot do not change If we translate from X to Y = aX + b where a is negative, then the features of the box-plot would be the same as those of boxplot of (X) We shall demonstrate this by getting the box-plots of , (3 +5) and (-3 +5) for the Scores data set ( is the score in the 2nd minor)
Applied Statistics and Computing Lab
17
Effect of translation
Box-plot of
Applied Statistics and Computing Lab
Box-plot of (3 +5)
Box-plot of (-3 +5)
18
R-codes
The R-code for box-plot is boxplot(variable name)
Applied Statistics and Computing Lab
19
Thank you
Applied Statistics and Computing Lab