1.
Define statistics, scope, importance and distrust of statistics
Meaning and Scope of Statistics
Introduction
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or
explanation and presentation of data. It provides tools for predicting and forecasting the
economic activities. It is useful for an academician, government, business etc.
On the basis of various definitions provided by economists, statistics has been broadly defined in
two senses: first is in plural sense and the second is in singular sense.
In plural sense, statistics refers to numerical facts and figures collected in a systematic manner
with a specific purpose in any field of study. In this sense, statistics is also aggregates of facts
expressed in numerical form.
The characteristics about statistical facts are:
Aggregate of facts
Numerically expressed
Data affected by multiplicity of causes
Enumerated according to reasonable standard of accuracy
Collected in systematic accuracy
Collected for pre-determined purpose and
Placed in relation to other
In singular sense, statistics refers to a science which comprises methods that are used in the
collection, analysis, interpretation and presentation of numerical data. These methods are used to
draw conclusion about the population parameters.
The stages of statistical analysis are:
Collection of data
Organisation of data
Presentation of data
Analysis of data and
Interpretation of data
Scope of statistics:
Statistics helps in business forecasting, decision making, quality control, search of new ventures,
study of market, study of business cycles, useful for planning, useful for finding averages, useful
for bankers, brokers, insurance, etc.
Importance and distrust of statistics
Statistics is an inevitable tool for an economist to analyse various economic problems. Statistics
is important for the study of consumption, production, exchange, and distribution and for
planned development.
The limitations of statistics are
It is not useful for individual cases.
It ignores qualitative aspects.
It deals with average only.
Improper use of statistics can be dangerous.
It is only a mean, not an end.
It do not distinguish between cause and effect and
Its results are not always dependable
The distrusts of statistics are:
Figures may be incomplete, inaccurate and deliberately manipulated.
Statistics can prove whatever it wants and
Statistics are tissues of falsehood
The distrust of statistics can be removed by Self control, being used by experts, considering the
limitations of statistics in mind.
Importance of statistics:
Statistics is both a science and an art. As a science statistical methods are generally systematic
and based on fundamental ideas and processes. It also works as a base for all other sciences. As
an art it explores the merits and demerits, guides about the means to achieve the objective.
Analysis of the statistical data requires the study in two ways:
One is empirical analysis: It implies the study of correlation between different variables.
The second is quantitative analysis: It implies the study of statistics by using the techniques such
as census, sample surveys and drawing conclusions on their basis.
The reasons for empirical and qualitative analysis of statistical data are:
To study the nature of problem
To study nature of variables
To formulate economic policies
To enhance knowledge
For budgetary analysis
2. methods of data collection
Data is a collection of facts, figures, objects, symbols, and events gathered from different
sources. Organizations collect data to make better decisions. Without data, it would be
difficult for organizations to make appropriate decisions, and so data is collected at various
points in time from different audiences.
For instance, before launching a new product, an organization needs to collect data on product
demand, customer preferences, competitors, etc. In case data is not collected beforehand, the
organization’s newly launched product may lead to failure for many reasons, such as less
demand and inability to meet customer needs.
Although data is a valuable asset for every organization, it does not serve any purpose until
analyzed or processed to get the desired results.
You can categorize data collection methods into primary methods of data collection and
secondary methods of data collection.
Primary Data Collection Methods
Primary data is collected from the first-hand experience and is not used in the past. The data
gathered by primary data collection methods are specific to the research’s motive and highly
accurate.
Primary data collection methods can be divided into two categories: quantitative methods and
qualitative methods.
Quantitative Methods:
Quantitative techniques for market research and demand forecasting usually make use of
statistical tools. In these techniques, demand is forecast based on historical data. These methods
of primary data collection are generally used to make long-term forecasts. Statistical methods are
highly reliable as the element of subjectivity is minimum in these methods.
Time Series Analysis
The term time series refers to a sequential order of values of a variable, known as a trend, at
equal time intervals. Using patterns, an organization can predict the demand for its products and
services for the projected time.
Smoothing Techniques
In cases where the time series lacks significant trends, smoothing techniques can be used. They
eliminate a random variation from the historical demand. It helps in identifying patterns and
demand levels to estimate future demand. The most common methods used in smoothing
demand forecasting techniques are the simple moving average method and the weighted moving
average method.
Barometric Method
Also known as the leading indicators approach, researchers use this method to speculate future
trends based on current developments. When the past events are considered to predict future
events, they act as leading indicators.
Qualitative Methods:
Qualitative methods are especially useful in situations when historical data is not available. Or
there is no need of numbers or mathematical calculations. Qualitative research is closely
associated with words, sounds, feeling, emotions, colors, and other elements that are non-
quantifiable. These techniques are based on experience, judgment, intuition, conjecture, emotion,
etc.
Quantitative methods do not provide the motive behind participants’ responses, often don’t reach
underrepresented populations, and span long periods to collect the data. Hence, it is best to
combine quantitative methods with qualitative methods.
Surveys
Surveys are used to collect data from the target audience and gather insights into their
preferences, opinions, choices, and feedback related to their products and services. Most survey
software often a wide range of question types to select.
Polls
Polls comprise of one single or multiple choice question. When it is required to have a quick
pulse of the audience’s sentiments, you can go for polls. Because they are short in length, it is
easier to get responses from the people.
Similar to surveys, online polls, too, can be embedded into various platforms. Once the
respondents answer the question, they can also be shown how they stand compared to others’
responses.
Interviews
In this method, the interviewer asks questions either face-to-face or through telephone to the
respondents. In face-to-face interviews, the interviewer asks a series of questions to the
interviewee in person and notes down responses. In case it is not feasible to meet the person, the
interviewer can go for a telephonic interview. This form of data collection is suitable when there
are only a few respondents. It is too time-consuming and tedious to repeat the same process if
there are many participants.
Delphi Technique
In this method, market experts are provided with the estimates and assumptions of forecasts
made by other experts in the industry. Experts may reconsider and revise their estimates and
assumptions based on the information provided by other experts. The consensus of all experts on
demand forecasts constitutes the final demand forecast.
Focus Groups
In a focus group, a small group of people, around 8-10 members, discuss the common areas of
the problem. Each individual provides his insights on the issue concerned. A moderator regulates
the discussion among the group members. At the end of the discussion, the group reaches a
consensus.
Questionnaire
A questionnaire is a printed set of questions, either open-ended or closed-ended. The respondents
are required to answer based on their knowledge and experience with the issue concerned. The
questionnaire is a part of the survey, whereas the questionnaire’s end-goal may or may not be a
survey.
Secondary Data Collection Methods
Secondary data is the data that has been used in the past. The researcher can obtain data from
the sources, both internal and external, to the organization.
Internal sources of secondary data:
Organization’s health and safety records
Mission and vision statements
Financial Statements
Magazines
Sales Report
CRM Software
Executive summaries
External sources of secondary data:
Government reports
Press releases
Business journals
Libraries
Internet
The secondary data collection methods, too, can involve both quantitative and qualitative
techniques. Secondary data is easily available and hence, less time-consuming and expensive as
compared to the primary data. However, with the secondary data collection methods, the
authenticity of the data gathered cannot be verified.
3. Tabulation and parts of table
When data is represented in rows and columns, it is called tabulation. To construct a table, it is
important to know the different components of a good statistical table. When all the components
are put together systematically, they form a table.
Tabulation can be done using one way, two way or three way classification depending upon the
number of characteristics involved.
A good table should have the following parts:
Table number: Table number is given to a table for identification purpose. If more than one
table is presented, it is the table number that distinguishes one table from another. It is given at
the top or at the beginning of the title of the table.
Title: The title of the table gives about the contents of the table. It has to be very clear, brief and
carefully worded, so information interpretations made from the table are clear and free from any
confusion.
Captions: These are the column headings given as designations to explain the figures of the
column.
Stubs: These are headings given to rows of the table. The designations of the rows are also
called stubs or stub items and the left column is known as stub column.
Body of the table: It is the main part and it contains the actual data. Location of any one data in
the table is fixed and determined by the row and column of the table.
Head note/Unit of measurement: The units of measurement of the figures in the table should
always be stated along with the title. If figures are large, they should be rounded off and the
method of rounding should be indicated.
Source: It is a brief statement or phrase indicating the source of data presented in the table. If
more than one source is there, all the sources are to be mentioned.
4. sources of primary and secondary data
For answer please refer question no 2
5. Stages of statistical investigation
A cycle that is used to carry out a statistical investigation. The cycle consists of five stages:
Problem, Plan, Data, Analysis, Conclusion. The cycle is sometimes abbreviated to the PPDAC
cycle.
The problem section is about formulating a statistical question, what data to collect, who to
collect it from and why it is important.
The plan section is about how the data will be gathered.
The data section is about how the data is managed and organised.
The analysis section is about exploring and analysing the data, using a variety of data displays
and numerical summaries, and reasoning with the data.
The conclusion section is about answering the question in the problem section and giving
reasons based on the analysis section.
6. sampling and methods of sampling
It would normally be impractical to study a whole population, for example when doing a
questionnaire survey. Sampling is a method that allows researchers to infer information about a
population based on results from a subset of the population, without having to investigate every
individual. Reducing the number of individuals in a study reduces the cost and workload, and
may make it easier to obtain high quality information, but this has to be balanced against having
a large enough sample size with enough power to detect a true association. (Calculation of
sample size is addressed in section 1B (statistics) of the Part A syllabus.)
If a sample is to be used, by whatever method it is chosen, it is important that the individuals
selected are representative of the whole population. This may involve specifically targeting hard
to reach groups. For example, if the electoral roll for a town was used to identify participants,
some people, such as the homeless, would not be registered and therefore excluded from the
study by default.
There are several different sampling techniques available, and they can be subdivided into two
groups: probability sampling and non-probability sampling. In probability (random) sampling,
you start with a complete sampling frame of all eligible individuals from which you select your
sample. In this way, all eligible individuals have a chance of being chosen for the sample, and
you will be more able to generalise the results from your study. Probability sampling methods
tend to be more time-consuming and expensive than non-probability sampling. In non-
probability (non-random) sampling, you do not start with a complete sampling frame, so some
individuals have no chance of being selected. Consequently, you cannot estimate the effect of
sampling error and there is a significant risk of ending up with a non-representative sample
which produces non-generalisable results. However, non-probability sampling methods tend to
be cheaper and more convenient, and they are useful for exploratory research and hypothesis
generation.
Probability Sampling Methods
1. Simple random sampling
In this case each individual is chosen entirely by chance and each member of the population has
an equal chance, or probability, of being selected. One way of obtaining a random sample is to
give each individual in a population a number, and then use a table of random numbers to decide
which individuals to include.1 For example, if you have a sampling frame of 1000 individuals,
labelled 0 to 999, use groups of three digits from the random number table to pick your sample.
So, if the first three numbers from the random number table were 094, select the individual
labelled “94”, and so on.
As with all probability sampling methods, simple random sampling allows the sampling error to
be calculated and reduces selection bias. A specific advantage is that it is the most
straightforward method of probability sampling. A disadvantage of simple random sampling is
that you may not select enough individuals with your characteristic of interest, especially if that
characteristic is uncommon. It may also be difficult to define a complete sampling frame and
inconvenient to contact them, especially if different forms of contact are required (email, phone,
post) and your sample units are scattered over a wide geographical area.
2. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals are chosen to
ensure an adequate sample size. If you need a sample size n from a population of size x, you
should select every x/nth individual for the sample. For example, if you wanted a sample size of
100 from a population of 1000, select every 1000/100 = 10th member of the sampling frame.
Systematic sampling is often more convenient than simple random sampling, and it is easy to
administer. However, it may also lead to bias, for example if there are underlying patterns in the
order of the individuals in the sampling frame, such that the sampling technique coincides with
the periodicity of the underlying pattern. As a hypothetical example, if a group of students were
being sampled to gain their opinions on college facilities, but the Student Record Department’s
central list of all students was arranged such that the sex of students alternated between male and
female, choosing an even interval (e.g. every 20th student) would result in a sample of all males
or all females. Whilst in this example the bias is obvious and should be easily corrected, this may
not always be the case.
3. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all share a similar
characteristic. It is used when we might reasonably expect the measurement of interest to vary
between the different subgroups, and we want to ensure representation from all the subgroups.
For example, in a study of stroke outcomes, we may stratify the population by sex, to ensure
equal representation of men and women. The study sample is then obtained by taking equal
sample sizes from each stratum. In stratified sampling, it may also be appropriate to choose non-
equal sample sizes from each stratum. For example, in a study of the health outcomes of nursing
staff in a county, if there are three hospitals each with different numbers of nursing staff (hospital
A has 500 nurses, hospital B has 1000 and hospital C has 2000), then it would be appropriate to
choose the sample numbers from each hospital proportionally (e.g. 10 from hospital A, 20 from
hospital B and 40 from hospital C). This ensures a more realistic and accurate estimation of the
health outcomes of nurses across the county, whereas simple random sampling would over-
represent nurses from hospitals A and B. The fact that the sample was stratified should be taken
into account at the analysis stage.
Stratified sampling improves the accuracy and representativeness of the results by reducing
sampling bias. However, it requires knowledge of the appropriate characteristics of the sampling
frame (the details of which are not always available), and it can be difficult to decide which
characteristic(s) to stratify by.
4. Clustered sampling
In a clustered sample, subgroups of the population are used as the sampling unit, rather than
individuals. The population is divided into subgroups, known as clusters, which are randomly
selected to be included in the study. Clusters are usually already defined, for example individual
GP practices or towns could be identified as clusters. In single-stage cluster sampling, all
members of the chosen clusters are then included in the study. In two-stage cluster sampling, a
selection of individuals from each cluster is then randomly selected for inclusion. Clustering
should be taken into account in the analysis. The General Household survey, which is undertaken
annually in England, is a good example of a (one-stage) cluster sample. All members of the
selected households (clusters) are included in the survey.1
Cluster sampling can be more efficient that simple random sampling, especially where a study
takes place over a wide geographical region. For instance, it is easier to contact lots of
individuals in a few GP practices than a few individuals in many different GP practices.
Disadvantages include an increased risk of bias, if the chosen clusters are not representative of
the population, resulting in an increased sampling error.
Non-Probability Sampling Methods
1. Convenience sampling
Convenience sampling is perhaps the easiest method of sampling, because participants are
selected based on availability and willingness to take part. Useful results can be obtained, but the
results are prone to significant bias, because those who volunteer to take part may be different
from those who choose not to (volunteer bias), and the sample may not be representative of other
characteristics, such as age or sex. Note: volunteer bias is a risk of all non-probability sampling
methods.
2. Quota sampling
This method of sampling is often used by market researchers. Interviewers are given a quota of
subjects of a specified type to attempt to recruit. For example, an interviewer might be told to go
out and select 20 adult men, 20 adult women, 10 teenage girls and 10 teenage boys so that they
could interview them about their television viewing. Ideally the quotas chosen would
proportionally represent the characteristics of the underlying population.
Whilst this has the advantage of being relatively straightforward and potentially representative,
the chosen sample may not be representative of other characteristics that weren’t considered (a
consequence of the non-random nature of sampling). 2
3. Judgement (or Purposive) Sampling
Also known as selective, or subjective, sampling, this technique relies on the judgement of the
researcher when choosing who to ask to participate. Researchers may implicitly thus choose a
“representative” sample to suit their needs, or specifically approach individuals with certain
characteristics. This approach is often used by the media when canvassing the public for
opinions and in qualitative research.
Judgement sampling has the advantage of being time-and cost-effective to perform whilst
resulting in a range of responses (particularly useful in qualitative research). However, in
addition to volunteer bias, it is also prone to errors of judgement by the researcher and the
findings, whilst being potentially broad, will not necessarily be representative.
4. Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach groups.
Existing subjects are asked to nominate further subjects known to them, so the sample increases
in size like a rolling snowball. For example, when carrying out a survey of risk behaviours
amongst intravenous drug users, participants may be asked to nominate other users to be
interviewed.
Snowball sampling can be effective when a sampling frame is difficult to identify. However, by
selecting friends and acquaintances of subjects already investigated, there is a significant risk of
selection bias (choosing a large number of people with similar characteristics or views to the
initial individual identified).
Bias in sampling
There are five important potential sources of bias that should be considered when selecting a
sample, irrespective of the method used. Sampling bias may be introduced when:1
1. Any pre-agreed sampling rules are deviated from
2. People in hard-to-reach groups are omitted
3. Selected individuals are replaced with others, for example if they are difficult to contact
4. There are low response rates
5. An out-of-date list is used as the sample frame (for example, if it excludes people who
have recently moved to an area)