.
DATA ANALYSIS
WEEK 1 LECTURE
FOUNDATIONS , DATA , DATA EVERYWHERE
TOPIC 1: INTRODUCTION TO DATA
DATA
We are living in an era of data. Everyday we generate thousands of
terabyte of data. This data can be the sales records of a company ,
the activity of users on an application, your payment history.
Data are facts and statistics collected together for reference or
analysis. It is information that has been translated into a form that
is efficient for movement or processing.
Raw data is a term used to describe data in it’s most basic digital
format. Early on , data importance in business computing became
apparent by the popularity of the terms ; data processing and
electronic data processing.
How Data is Stored
Computers represent data, including video, images , sounds and
text as binary values using patterns of just two numbers : 1 and 0. A
bit is the smallest unit of data and represents just a single value. A
byte is eight binary digits long. Storage and memory is measured in
Megabytes (MB) and Gigabyte(GB)
How Data can be Categorized:
Data can be categorized in two ways;
1. Structured Data: This is when Data is Organized and easier to work
with
2. Unstructured Data: Data is not organized , The data must be
organized before analysis can be done on it.
Data science
Data science is a combination of multiple disciplines that uses
statistics, data analysis and machine learning to analyze data and
to extract knowledge and insights from it.
Data Science is about data gathering, analysis and decision making.
It is about finding patterns in data, through analysis and making
future predictions.
Disciplines Under Data Science
Data Analytics
Data Analysis
Data analytics is the science of
Data analysis is a process of analyzing raw data to make
inspecting, cleansing, transforming, conclusions about that information.
and modelling data with the goal of Many of the techniques and processes
discovering useful information, of data analytics have been
informing conclusions, and automated into mechanical processes
supporting decision-making. and algorithms that work over raw
data for human consumption
Disciplines Under Data Science
Data and Information Visualization Machine Learning
Data and information visualization is Machine learning (ML) is the
an interdisciplinary field that deals subset of artificial intelligence
with the graphic representation of (AI) that focuses on building
data and information. It is a systems that learn—or improve
particularly efficient way of performance—based on the data
communicating when the data or they consume.
information is numerous as for
example a time series
Disciplines Under Data Science
Artificial Intelligence
Data Mining
Artificial intelligence (AI)
Data mining is an essential process for many is intelligence demonstrated
data analytics tasks. This involves extracting by machines, as opposed to the natural
data from unstructured data sources. These intelligence displayed
may include written text, large complex by animals including humans. AI
databases, or raw sensor data. The key steps research has been defined as the field of
in this process are to extract, transform, and study of intelligent agents, which refers
to any system that perceives its
load data (often called ETL.)
environment and takes actions that
maximize its chance of achieving its
goals
Disciplines Under Data Science
Statistics Natural Language Processing
Statistics is the discipline that concerns the Natural language processing (NLP) is a subfield
collection, organization, analysis, of linguistics, computer science, and artificial
intelligence concerned with the interactions
interpretation, and presentation of data.[ In between computers and human language, in
applying statistics to a scientific, industrial, or particular how to program computers to process
social problem, it is conventional to begin and analyze large amounts of natural
with a statistical population or a statistical language data. The goal is a computer capable of
model to be studied "understanding" the contents of documents,
including the contextual nuances of the language
within them.
Data Analysis
Data Analysis is a process of inspecting , cleansing, transforming
and modelling data with the goal of discovering useful information ,
informing conclusions and supporting decision – making.
The procedure helps reduce the risks inherent in decision-making by
providing useful insights and statistics, often presented in charts,
images, tables, and graphs.
Data Analysis Process
What Is the Data Analysis Process?
Answering the question “what is data analysis” is only the first step.
Now we will look at how it’s performed. The data analysis process, or
alternately, data analysis steps, involves gathering all the information,
processing it, exploring the data, and using it to find patterns and other
insights. The process consists of:
Data Requirement Gathering: Ask yourself why you’re doing this
analysis, what type of data analysis you want to use, and what data
you are planning on analyzing.
Data Collection: Guided by the requirements you’ve identified, it’s
time to collect the data from your sources. Sources include case
studies, surveys, interviews, questionnaires, direct observation, and
focus groups. Make sure to organize the collected data for analysis.
Data Analysis Process
Data Cleaning: Not all of the data you collect will be useful, so it’s
time to clean it up. This process is where you remove white
spaces, duplicate records, and basic errors. Data cleaning is
mandatory before sending the information on for analysis.
Data Analysis: Here is where you use data analysis software and
other tools to help you interpret and understand the data and
arrive at conclusions. Data analysis tools include Excel, Python,
R, Looker, Rapid Miner, Chartio, Metabase, Redash, and Microsoft
Power BI.
Data Analysis Process
Data Interpretation: Now that you have your results, you need to
interpret them and come up with the best courses of action, based on
your findings.
Data Visualization: Data visualization is a fancy way of saying,
“graphically show your information in a way that people can read and
understand it.” You can use charts, graphs, maps, bullet points, or a
host of other methods. Visualization helps you derive valuable insights
by helping you compare datasets and observe relationships.
Uses of Data Analysis
1. Data analysis can help companies better understand their
customers, evaluate their ad campaigns, personalize
content, create content strategies and develop products
2. Companies can use the insights they gain from data analysis
to inform their decisions, leading to better outcomes.
3. Data analysis helps companies to streamline their
processes, save money and boost their bottom line
4. Data analysis can help an organization understand risks and
take preventive measures. For instance, a retail chain could
run a propensity model — a statistical model that can
predict future actions or events — to determine which stores
are at the highest risk for theft.
Types of Data Analysis
Descriptive Analysis
Descriptive Analysis: it is the foundation of all data insights. It is the
simplest and most common use of data in business today. Its
analysis answers the “What Happened” by summarizing past data ,
usually in the form of dashboards. The biggest use of descriptive
analysis in business is to track key performance indicators(KPIs).
KPIs describe how a business is performing based on chosen
benchmarks. Business applications of descriptive analysis include:
KPI Dashboards
Monthly Revenue Reports
Sales Leads Overview
Diagnostic Analysis
After asking the main question of what happened, the next step is
to dive deeper and ask why did it happen. This is where diagnostic
analysis comes in. it takes the insights found from descriptive
analytics and drive down to find the causes of those outcomes
Business Applications of Diagnostic include:
A freight company investigating the cause of slow shipments in a
certain region
A Saas company drilling down to determine which marketing
activities increased trials.
Predictive Analysis
It attempts to answer the question what is likely to happen . This type of
analytics utilizes previous data to make predictions about future outcomes . It
uses summarized data to make logical predictions of the outcomes of events. It
relies on statistical modelling which requires added technology and manpower
to forecast.
Business Applications of Predictive Analysis include:
Risk Assessment
Sales Forecasting
Using Customer Segmentation to determine which leads have the best chance
of converting
Predictive Analytics in Customer success teams
Prescriptive Analysis
Prescriptive Analysis is the frontier of data analysis, combining the
insight from all previous analyses to determine the course of action
to take in a current problem or decision.
Artificial intelligence (AI) is a perfect example of prescriptive
analytics. AI systems consume a large amount of data to
continuously learn and use this information to make informed
decisions.
Currently, Most of the big data-driven companies( Apple, Meta,
Netflix) are utilizing prescriptive analytics and Artificial intelligence
to improve decision making.
What does a Data Analyst do
A data Analyst collects , cleans and interprets data sets in order to
answer a question or solve a problem. They work in may industries
including business , finance , Medicine and Government.
Skills needed for Data Analysis
Spreadsheets
Microsoft Excel Google Sheets
Programming Language
Python R
Data Visualization
Tableau
Power BI
Querying Language
NO SQL
SQL
Data Cleaning Critical Thinking
Statistics