0% found this document useful (0 votes)

4 views36 pages

Data Analytics - Intermediate

Uploaded by

vdalal83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views36 pages

Data Analytics - Intermediate

Uploaded by

vdalal83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Data Analytics - Intermediate

Foundation Skills Academy

Index

1. Quick Recap of Basics

2. Data Engineer’s Ecosystem

3. Data Extraction & Storage

4. Languages for Data Professionals

5. Data Analysis – Regression Analysis

6. Marketing Mix Modeling

Data Analytics Basics Recap
Skills Required for Data Analytics

To excel in the field of Data Analytics need, synthesis of technical skills, domain skills and soft skills

Technical skills
• Data Analytics is not only about analyzing data but also about understanding how to collect, organise and store data
• Understanding important programming languages such as SQL, Python and R
• Form a strong base of statistical analysis, maths for advanced analytics
• Understanding Machine Learning and algorithms to identify data patterns and solve for a business case

Domain skills - Bridge the gap between raw data and recommendations, interpret results accurately, collaborate with SMEs

Soft skills
• Communication skills to be able to explain your data-driven insights and market trends to businesses and convince them to
make smart business decisions. Art of story telling is important.
• Visualization skills for effective reporting and dashboarding
Data Engineer’s Ecosystem
Data Engineer’s Ecosystem

• A data engineer’s ecosystem includes the infrastructure, tools, frameworks, and processes for extracting data from
disparate sources, architecting and managing data pipelines for transformation, integration, and storage of data.

• Data comes in a wide-ranging variety of file formats being collected from a variety of data sources, ranging from relational
and non-relational databases to APIs, web services, data streams, social platforms, and sensor devices.

• There are two main types of Data Repositories -Transactional and Analytical

• Transactional systems, also known as Online Transaction Processing (or OLTP) systems, are designed to store
high-volume day-to-day operational data e.g. online banking transactions, ATM transactions, and airline bookings.

• Analytical systems, also known as Online Analytical Processing (OLAP) systems, are optimized for complex data
analytics. These include relational and non-relational databases, data warehouses, data marts, data lakes.

• Data Integration tools combine data from disparate sources using processes such as the Extract-Transform-and-Load.
The ecosystem includes languages such as query languages, programming languages, and shell and scripting languages.

• BI and Reporting tools are used to collect data from multiple data sources and present them in a visual format, such as
interactive dashboards. These are drag and drop products that do not require the users to know any programming.
Data Extraction

• Typically, internal applications use relational databases such as SQL Server, Oracle, MySQL, DB2, to store data in a
structured way.

• External to the organization, there are other publicly and privately available datasets. For example, government
organizations releasing demographic and economic datasets. There are companies that sell specific data, for example,
Point-of-Sale data or Financial data, or Weather data.

• Many data providers and websites provide APIs (Application Program Interfaces) and Web Services. APIs and Web
Services typically listen for incoming requests and return data in plain text, XML, HTML, JSON, or media files e.g. X and
Facebook provide APIs to source data from tweets/posts, Stock Market APIs for pulling share and commodity prices.

• Web scraping makes it possible to download specific data from web pages based on defined parameters. Web scrapers
can extract text, contact information, images, videos, product items etc. which can be used to provide price comparisons;
extracting data on various forums and communities; collecting training and testing datasets for machine learning models.

• Data streams are another widely used source for aggregating constant streams of data flowing from sources such as
instruments, IoT devices, and applications, GPS data from cars, computer programs, websites, and social media posts.

• RSS (or Really Simple Syndication) feeds, are another popular data source used for capturing updated data from online
forums and news sites where data is refreshed on an ongoing basis.
Data Storage

• There are different types of databases. Several factors influence the choice of database, such as the data type and structure,
querying mechanisms, latency requirements, transaction speeds, and intended use of the data.
• Relational databases are optimized for data operations and querying involving many tables and much larger data volumes.
• Non-relational databases stores data in a schema-less or free-form fashion. NoSQL is widely used for processing this data.
• A data warehouse works as a central repository that merges information coming from disparate sources and consolidates it
into one comprehensive database for analytics and business intelligence.
• A data mart is a sub-section of the data warehouse, built specifically for a particular business function, purpose.
• A data lake exists as a repository of raw data in its native format, straight from the source, to be transformed.
• Big Data Stores include distributed computational and storage infrastructure to store, scale, and process large data sets.
Big Data Processing Technologies - Hadoop, Hive, Spark
• Hadoop is a collection of tools that provides distributed storage and processing of big data.
• In Hadoop distributed system, a node is a single computer, and a collection of nodes forms a cluster. Hadoop can scale up
from a single node to any number of nodes, each offering local storage and computation.
• HDFS provides scalable big data storage by partitioning files over multiple nodes. It splits large files across multiple
computers, allowing parallel access and processing. It also replicates file blocks on different nodes to prevent data loss.
• Hive is an open-source data warehouse software for reading, writing, and managing large data set files that are stored
directly in either HDFS or other data storage systems such as Apache HBase.
• Spark is a general-purpose data processing engine designed to extract and process large volumes of data for a wide range
of applications, including Interactive Analytics, Streams Processing, Machine Learning, Data Integration, and ETL. It takes
advantage of in-memory processing to significantly increase the speed of computations and spilling to disk only when
memory is constrained.
Languages for Data Professionals
Languages for Data Professionals

• There are hundreds of programming languages out there, built for diverse purposes. These can be categorized as – query
languages, programming languages, and shell scripting.
• Query languages are designed for accessing and manipulating data in a database; for example, SQL
• Programming languages are designed for developing applications and controlling application behavior; for example,
Python, R, and Java
• Shell and Scripting languages, such as Unix/Linux Shell, and PowerShell, are ideal for repetitive and time-consuming
operational tasks
Languages for Data Professionals - SQL

• SQL, or Structured Query Language, is a querying language designed for accessing and manipulating information from,
mostly, though not exclusively, relational databases.
• Using SQL, we can write a set of instructions to perform operations such as Insert, update, and delete records in a
database; Create new databases, tables, and views; and Write stored procedures - which means you can write a set of
instructions and call them for later use.
• Here are some advantages of using SQL,
• SQL is portable and can be used independent of the platform. It can be used for querying data in a wide variety of
databases and data repositories, although each vendor may have some variations and special extensions.
• It has a simple syntax that is similar to the English language. Its syntax allows developers to write programs with
fewer lines than some of the other programming languages using basic keywords such as select, insert, into, and
update.
• It can retrieve large amounts of data quickly and efficiently.
• It runs on an interpreter system, which means code can be executed as soon as it is written, making prototyping quick
• Due to its large user community and the sheer volume of documentation accumulated over the years, it continues to
provide a uniform platform, worldwide, to all its users.
Languages for Data Professionals – Python, R

Python
• Python is a widely-used open-source, general-purpose, high-level programming language.
• Its syntax allows programmers to express their concepts in fewer lines of code, as compared to other older languages
• Python is perceived as one of the easiest languages to learn and has a large developer community making it ideal for
beginning programmers.
• Python provides libraries like Numpy and Pandas, which eases tasks by the use of parallel processing. It has in built
functions for almost all of the frequently used concepts.
• Python supports multiple programming paradigms, such as object-oriented, imperative, functional, and procedural, making it
suitable for a wide variety of use cases.
R
• R is a language and environment for statistical programming which includes statistical computing and graphics.
• It has many easy-to-use packages for performing tasks like ggplot2, caret, etc.
• R applications are ideal for visualizing your data in beautiful graphics.
Languages for Data Professionals - Unix

• A Unix/Linux Shell is a computer program written for the UNIX shell.

• It is a series of UNIX commands written in a plain text file to accomplish a specific task.
• Writing a shell script is fast and easy. It is most useful for repetitive tasks that may be time-consuming to execute by
typing one line at a time.
• Typical operations performed by shell scripts include: file manipulation, program execution, system administration tasks such
as disk backups and evaluating system logs, installation, scripts for complex programs, running batches
Regression Analysis
Data Analysis – Regression Analysis
Regression analysis is a statistical method for understanding and quantifying the relationship between two or more variables.
It helps a business estimate one dependent variable based on the values of one or more independent variables.

Dependent Variable: The dependent variable is essentially the "outcome" you’re trying to understand or predict. It’s the
focus of your study, whether you’re looking at quarterly sales figures, customer satisfaction ratings, or any other key result.

Independent Variable: Independent variables are the "factors" that might influence or cause changes in the dependent
variable. These are the variables you manipulate or observe to see their impact on your outcome of interest. For example, if
you adjust the price of a product, that price change is an independent variable that could affect sales figures.

Output of Regression Analysis

n
Data Analysis – Types of Regression Analysis

Simple Linear regression: Simple linear regression is used when a single independent variable predicts a dependent
variable. The linear regression formula is represented as Y = a + bX where, Y is the dependent var. X is the independent
var. a is the intercept (value of Y when X = 0). b is the slope, also called as coefficient (change in Y for a unit change in X).

Business Application: It's frequently used to identify how a change in one variable will affect another. For example, predicting
sales based on advertising expenditure or estimating employee productivity based on hours worked.

Multiple Linear regression: Multiple regression extends linear regression by considering multiple independent variables to
predict the dependent variable. The relationship is represented as Y = a + b₁X₁ + b₂X₂ + ... + bₙXₙ

Business Application: Businesses use it to understand how multiple factors influence outcomes. For instance, predicting
home prices based on features like square footage, number of bedrooms, and neighborhood.

Non-Linear regression: It is used in cases where the relationship between the dependent and independent variables is
nonlinear. The model can take various forms depending on the specific problem. It is generally represented as Y = f(X, θ)
where θ represents the parameters of the nonlinear function f.
Data Analysis – Types of Regression Analysis

Examples of Nonlinear regression,

Logistic Regression: Logistic regression is used when the dependent variable is binary (two possible outcomes). It models
the probability of a particular outcome occurring.

Business Application: In business, logistic regression is employed for tasks like predicting customer churn (yes/no), whether
a customer will purchase a product (yes/no), or whether a loan applicant will default on a loan (yes/no).

Polynomial Regression: Polynomial regression is used when the relationship between the independent and dependent
variables follows a polynomial curve and is not linear.

Business Application: It can be used to model more complex relationships in data, such as predicting the growth of a plant-
based on time and other environmental factors.

Exponential Regression: Exponential regression is a type of nonlinear regression that fits an exponential function to the
data. The general form of an exponential regression model is:

Power Regression: Power regression is a type of nonlinear regression that fits a power function to the data. The general
form of a power regression model is:
Data Analysis – Importance of Regression Analysis

Predictive Modeling: Regression analysis is commonly used for predictive modeling. By examining historical data and
identifying relationships between variables, businesses can make informed predictions about sales, demand, etc.

Identifying Key Drivers: Regression analysis can help identify which independent variables significantly impact the
dependent variable. For e.g., it can determine which marketing channels or advertising strategies influence sales most

Optimizing Decision Making: Whether it's optimizing pricing strategies, production processes, or marketing campaigns,
regression can help companies allocate resources efficiently and achieve better outcomes.

Risk Assessment: Businesses are exposed to various risks, such as economic fluctuations, market changes, and
competitive pressures. Regression analysis-powered risk assessment techniques can be used to assess how changes in
independent variables may affect business performance.

Performance Evaluation: Regression analysis can evaluate the effectiveness of different initiatives and strategies. For
instance, it can assess the impact of employee training on productivity or the relationship between customer satisfaction
and repeat purchases.

Market Research: In market research, regression analysis can be used to understand consumer behavior and
preferences. By examining demographics, pricing, and product features, businesses can tailor their products and
marketing efforts to specific target audiences.
How to Perform Regression Analysis?

Data collection and preparation: Gather and clean data, ensuring it meets assumptions like linearity and independence.

Appropriate regression model: Choose the correct type of regression (linear, polynomial etc.) based on data and objective

Data analysis and interpretation: Analyze results, assess model accuracy, and interpret coefficients to draw conclusions.

Model evaluation and validation: Test model's performance using metrics like R-squared, mean-squared error.

• p values and coefficients in regression analysis work together to tell which relationships in the model are statistically
significant and the nature of those relationships.

• The linear regression coefficients describe the mathematical relationship between each independent variable and the
dependent variable. The p values for the coefficients indicate whether these relationships are statistically significant.

• After fitting a regression model, check the residual plots to be sure that you have unbiased estimates.

• R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the
variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength
of the relationship between your model and the dependent variable on a 0 – 100% scale. For example, an R-squared of
60% reveals that 60% of the variability observed in the target variable is explained by the regression model. Generally, a
higher R-squared indicates more variability is explained by the model.

Using software tools: Use Python or R to perform regression analysis efficiently.

Simple Linear Regression Example

You are a social researcher interested in the relationship between income and happiness. You survey 500 people whose
incomes range from $15k to $75k and ask them to rank their happiness on a scale from 1 to 10.

Your independent variable (income) and dependent variable (happiness) are both quantitative, so you can do a regression
analysis to see if there is a linear relationship between them.

R code for simple linear regression:

income.happiness.lm <- lm(happiness ~ income, data = income.data)

This code takes the data you have collected (data = income.data) and calculates the effect that the independent variable
income has on the dependent variable happiness using the equation for the linear model: lm().

To view the results of the model, you can use the summary() function in R:

summary(income.happiness.lm)
Simple Linear Regression Example

Results of the Model:

This output table first repeats the formula

that was used to generate the results
(‘Call’), then summarizes the model
residuals (‘Residuals’), which give an idea
of how well the model fits the real data.

Next is the ‘Coefficients’ table. The first

row gives the estimates of the y-intercept,
and the second row gives the regression
coefficient of the model.

happiness = 0.20 + 0.71*income ± 0.018

The number in the table (0.713) tells us

that for every one unit increase in income
(where one unit of income = $10,000)
there is a corresponding 0.71 unit
increase in reported happiness (where
happiness is a scale of 1 to 10)
Simple Linear Regression Example

Results of the Model:

The Std. Error column shows how much
variation there is in our estimate of the
relationship between income and
happiness.

The t value column displays the test

statistic. The larger the test statistic, the
less likely it is that our results occurred by
chance.

The Pr(>| t |) column shows the p value.

The p-value indicates whether the
independent variable has a significant
influence. p-values smaller than 0.05 (or
sometimes 0.001) are considered as
significant.

Because the p value is so low (p < 0.001),

we can conclude that income has a
statistically significant effect on
happiness.
Simple Linear Regression Example
Residual Plots
A residual is a measure of how far away a point is vertically from the regression line. Simply, it is the error between a
predicted value and the observed actual value.

The most important assumption

of a linear regression model is
that the errors are independent
and normally distributed.

A few characteristics of a good

residual plot are as follows:

• It has a high density of

points close to the origin and
a low density of points away
from the origin
• It is symmetric about the
origin
Assumptions of Regression Analysis

Linearity: The relationship between the independent and dependent variables is linear.

Sample representativeness: The sample is representative of the population.

Normally distributed errors: The errors are normally distributed.

Homoscedasticity: The variance of the errors (residuals) remains constant across all levels of the independent
variable(s). Put simply, it signifies that the dispersion of residuals stays consistent, enhancing the accuracy and legitimacy
of regression predictions.

No Multicollinearity : When independent variables are highly correlated, it becomes challenging to determine their impact
on the dependent variable.

No outliers: There are no outliers in the data.

No endogeneity: There is no relationship between the errors and the independent variables that can lead to biased
results.
Marketing Mix Modeling
Marketing Mix Modeling

Market Mix Modeling (MMM) is a technique which helps in quantifying the impact of several marketing inputs on Sales or
Market Share. The purpose of using MMM is to understand how much each marketing input contributes to sales, and how
much to spend on each marketing input. Specifically, here are some ways MMM helps businesses thrive,

Optimizing marketing spending helps businesses understand what marketing activities contribute most effectively to
achieving business objectives.

Budget allocation After analyzing the ROI of various marketing channels and tactics, businesses can make more
informed decisions about where to allocate their marketing budget with the greatest yield

Forecasting and planning Businesses can simulate the impact of changes in marketing strategies or external factors and
use these insights to anticipate the potential outcomes and adjust their plans accordingly

Understanding customer behavior helps businesses understand how different customer segments respond to various
marketing stimuli, enabling more targeted and effective marketing strategies.

Continuous improvement Monitoring key performance metrics and analyzing trends enables businesses to identify
opportunities for optimization, test new strategies, and adapt to changing market conditions, ensuring that their marketing
efforts remain effective and competitive.
Marketing Mix Modeling - Regression

MMM uses Regression technique and the analysis performed is further used for extracting key insights.

The dependent variable could be Sales or Market Share. The independent variables usually used are distribution, price,
TV spends, outdoor campaigns spends, digital spends, newspaper and magazine spends, below the line promotional
spends, and consumer promotions information etc.

An equation is formed between the dependent variables and predictors. This equation could be linear or non-linear
depending on the relationship between the dependent variable and various marketing inputs.

In Market Mix Modeling, sales are divided into 2 components:

a. Base Sales: Base Sales is what marketers get if they do not do any advertisement. It is sales due to brand equity built
over the years. Base Sales are usually fixed unless there is some change in economic or environmental factors.
b. Incremental Sales: Sales generated by marketing activities like TV advertisement, print advertisement, and digital
spends, promotions etc. Total incremental sales is split into sales from each input to calculate contribution to total sales.
Marketing Mix Modeling – Simple Linear Regression

• The goal of simple regression

analysis is to find the line (i.e. the
intercept and slope) that goes
through the maximum data points in
this chart, minimizing the distance
between each point and its
projection on the line.

• The difference between the initial

position of the point and its
projection on the line, which is
represented by the dot in orange,
represents the error term. The
smaller the value of the error term,
the better is the regression line in
estimating the relationship
between x and y.
Marketing Mix Modeling – Multiple Linear Regression

This is a more representative setting as simple linear regression is hardly used in real life MMM projects; as it is too
simplistic and does not handle the complexity of consumer behavior and the media landscape.

In a typical marketing mix modeling project, multiple variables impact the sales performance. To be able to measure the
impact of those variables on sales or any other chosen KPI, the analyst needs to build a robust model which accounts for
all the variables influencing the movement of sales.

• x1,x2,...,xk are the independent variables influencing sales.

• The term βX represents the contribution of the variable X on sales: i.e. how much sales are driven
by the variable X (incremental impact)
Marketing Mix Modeling – Contribution Chart

A contribution chart visually represents different marketing tactics’ impact on sales. It shows how much each tactic
contributes to the total sales and highlights the most effective ones.

The chart can be used to benchmark performance, compare campaigns over time, and plan for future initiatives.
Contribution charts provide an easy-to-understand overview of where a marketer’s efforts should be focused to maximize
ROI and optimize campaign performance.
Limitations of Marketing Mix Modeling

Data quality: Poor data quality or incomplete datasets can lead to inaccurate results.

Model complexity: As models become more complex, they require more resources and expertise to develop and
maintain.

External factors: Since MMM relies on historical data, external changes or competitors’ actions may invalidate
assumptions made in the model and reduce its accuracy.

Cost: Marketing mix modeling is relatively expensive to implement and maintain. It also requires dedicated staff and
resources to develop, interpret, and act on the results.

Time lag: Due to its reliance on past data, MMM may not accurately predict what the future holds. The longer the time lag
between when a decision is made and when it is implemented, the less accurate the model will be.

Granularity: In order to provide more detailed insights, MMM models must often take into account multiple layers of
granular data points, which can require advanced analytics capabilities that are cost-prohibitive for some organizations.
Foundation Skills Academy

Thank You
Data Analysis – Advanced Techniques (Statistics)

• Regression Analysis: This entails analyzing the relationship between one or more independent variables and a
dependent variable. The independent variables are used to explain the dependent variable, showing how changes in
the independent variables influence the dependent variable.

• Factor Analysis: This entails taking a complex dataset with many variables and reducing the variables to a small
number. The goal of this maneuver is to attempt to discover hidden trends that would otherwise have been more
difficult to see.

• Cohort Analysis: This is the process of breaking a data set into groups of similar data, often into a customer
demographic. This allows data analysts and other users of data analytics to further dive into the numbers relating to a
specific subset of data.

• Monte Carlo Simulations: Models the probability of different outcomes happening. They're often used for risk
mitigation and loss prevention. These simulations incorporate multiple values and variables and often have greater
forecasting capabilities than other data analytics approaches.

• Time Series Analysis: Tracks data over time and solidifies the relationship between the value of a data point and the
occurrence of the data point. This data analysis technique is usually used to spot cyclical trends or to project financial
forecasts.
Factor Analysis

• Factor analysis is a technique used to reduce a large number of variables to a smaller number of factors. It works
on the basis that multiple separate, observable variables correlate with each other because they are all associated with
an underlying construct.

• Let’s imagine you want to get to know your customers better, so you send out a rather long survey comprising one
hundred questions. Some of the questions relate to how they feel about your company and product; for example,
“Would you recommend us to a friend?” and “How would you rate the overall customer experience?” Other questions
ask things like “What is your yearly household income?” and “How much are you willing to spend on skincare each
month?”

• Once the survey has been filled, you end up with a large dataset that essentially tells you one hundred different things
about each customer. Instead of looking at each of these responses (or variables) individually, you can use factor
analysis to group them into factors that belong together

• In this example, factor analysis works by finding survey items that are strongly correlated. This is known as covariance.
So, if there’s a strong positive correlation between household income and how much they’re willing to spend on
skincare each month (i.e. as one increases, so does the other), these items may be grouped together.

• In the end, you have a smaller number of factors rather than hundreds of individual variables. These factors are then
taken forward for further analysis

Data Science & Analytics Guide
No ratings yet
Data Science & Analytics Guide
6 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Data Engineering QB 14 Aug v1.0
No ratings yet
Data Engineering QB 14 Aug v1.0
40 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
DS Unit 1 Chapter 1
No ratings yet
DS Unit 1 Chapter 1
40 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
18 pages
Data Analytics - My Notes
No ratings yet
Data Analytics - My Notes
40 pages
Unit II
No ratings yet
Unit II
32 pages
2 Emerging
No ratings yet
2 Emerging
10 pages
SQL ANalyst by CT Taylor Part 3
No ratings yet
SQL ANalyst by CT Taylor Part 3
5 pages
BA - Unit-1
No ratings yet
BA - Unit-1
82 pages
Advance Data Science Clusters
No ratings yet
Advance Data Science Clusters
10 pages
Unit 2
No ratings yet
Unit 2
15 pages
Session 1
No ratings yet
Session 1
48 pages
Data Munging
No ratings yet
Data Munging
61 pages
Glossary
No ratings yet
Glossary
11 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
2.data Science Tools
No ratings yet
2.data Science Tools
13 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
Data Structures
No ratings yet
Data Structures
50 pages
Terminologies Used in Big Data Environments
No ratings yet
Terminologies Used in Big Data Environments
3 pages
A Review On Data Science Technologies
No ratings yet
A Review On Data Science Technologies
3 pages
Unit 1 - DS - 1st Year
No ratings yet
Unit 1 - DS - 1st Year
13 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Download
No ratings yet
Download
4 pages
Data Science Language Guide
No ratings yet
Data Science Language Guide
65 pages
Module 3 - Data Science
No ratings yet
Module 3 - Data Science
22 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
Unit 4
No ratings yet
Unit 4
60 pages
Lecture Notes: Data Ingestion For Structured/Unstructured Data
No ratings yet
Lecture Notes: Data Ingestion For Structured/Unstructured Data
31 pages
BD Unit 1
No ratings yet
BD Unit 1
72 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
Data Science Unit-1 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-1 B.sc. III Sem. MDC
10 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
5 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Tools For Data Science
No ratings yet
Tools For Data Science
16 pages
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Database Languages and Big Data Applications
No ratings yet
Database Languages and Big Data Applications
12 pages
Chapter 5 Database Systems
No ratings yet
Chapter 5 Database Systems
7 pages
Ics054 Unit 1
No ratings yet
Ics054 Unit 1
14 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
The Excitement of Data Science
No ratings yet
The Excitement of Data Science
137 pages
Content
No ratings yet
Content
36 pages
Data Science
No ratings yet
Data Science
14 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
BAD601 Module 1 PDF
No ratings yet
BAD601 Module 1 PDF
64 pages
New Text Document
No ratings yet
New Text Document
7 pages
DAV Notes
No ratings yet
DAV Notes
266 pages
Analyzing Limitations and Solutions of Existing Data Analytics
No ratings yet
Analyzing Limitations and Solutions of Existing Data Analytics
21 pages
Tools in Data Analytics
No ratings yet
Tools in Data Analytics
17 pages
BIGDATA
No ratings yet
BIGDATA
43 pages
1 Introduction
No ratings yet
1 Introduction
39 pages
SQL Material
No ratings yet
SQL Material
56 pages
QT 4.4.3 - QTSQL Module
No ratings yet
QT 4.4.3 - QTSQL Module
16 pages
Oracle &VB Record
No ratings yet
Oracle &VB Record
63 pages
Database Practical Questions 5 Set Queries
No ratings yet
Database Practical Questions 5 Set Queries
16 pages
Data Modeling & Engineering Guide
No ratings yet
Data Modeling & Engineering Guide
24 pages
SQL Interview Questions For Freshers
No ratings yet
SQL Interview Questions For Freshers
17 pages
CH14 PPT
No ratings yet
CH14 PPT
80 pages
Talend Job Design Patterns
No ratings yet
Talend Job Design Patterns
60 pages
DBMS Lab Manual for Students
100% (1)
DBMS Lab Manual for Students
94 pages
Developing Java Applications - Db2aje90
No ratings yet
Developing Java Applications - Db2aje90
401 pages
Dbms MCQ
No ratings yet
Dbms MCQ
2 pages
Pgdca Syllabus
No ratings yet
Pgdca Syllabus
15 pages
Interface Python With MYSQL - Tutorial - 2
No ratings yet
Interface Python With MYSQL - Tutorial - 2
7 pages
SQL DDL and DML Commands Assignment - Complete Impl
No ratings yet
SQL DDL and DML Commands Assignment - Complete Impl
5 pages
Constraints During Create TBL Exercise 2
No ratings yet
Constraints During Create TBL Exercise 2
5 pages
Oracle Forms
0% (1)
Oracle Forms
20 pages
SQL Question
No ratings yet
SQL Question
6 pages
Sample EBS Technical - Resume
No ratings yet
Sample EBS Technical - Resume
5 pages
Class XII Computer Science Practical File
No ratings yet
Class XII Computer Science Practical File
45 pages
Python Brochure 1665648078143
No ratings yet
Python Brochure 1665648078143
8 pages
Oracle TrendNxt L1 5thfeb
No ratings yet
Oracle TrendNxt L1 5thfeb
15 pages
Ajt PR1
No ratings yet
Ajt PR1
5 pages
SQL Class 2 Assignment 2 Solution
No ratings yet
SQL Class 2 Assignment 2 Solution
5 pages
DP 5 2 Practice
No ratings yet
DP 5 2 Practice
4 pages
Azure SQL Course Summary
No ratings yet
Azure SQL Course Summary
5 pages
PeopleTools 8.53 Resolved Issues List
0% (1)
PeopleTools 8.53 Resolved Issues List
15 pages
6 SQL Bangla Tutorials
No ratings yet
6 SQL Bangla Tutorials
16 pages
Structured Query Language (SQL) : (Chapter 11)
No ratings yet
Structured Query Language (SQL) : (Chapter 11)
7 pages
Work With Linked Servers
No ratings yet
Work With Linked Servers
44 pages
Elt Process With Snowflake Stored Procedure and Task
No ratings yet
Elt Process With Snowflake Stored Procedure and Task
6 pages
Banner Ethos API Installation Guide 9.15 March 2019 PDF
No ratings yet
Banner Ethos API Installation Guide 9.15 March 2019 PDF
92 pages

Data Analytics - Intermediate

Uploaded by

Data Analytics - Intermediate

Uploaded by

Data Analytics - Intermediate

Foundation Skills Academy

1. Quick Recap of Basics

2. Data Engineer’s Ecosystem

3. Data Extraction & Storage

4. Languages for Data Professionals

5. Data Analysis – Regression Analysis

6. Marketing Mix Modeling

• A Unix/Linux Shell is a computer program written for the UNIX shell.

Output of Regression Analysis

Examples of Nonlinear regression,

Using software tools: Use Python or R to perform regression analysis efficiently.

R code for simple linear regression:

income.happiness.lm <- lm(happiness ~ income, data = income.data)

Results of the Model:

This output table first repeats the formula

Next is the ‘Coefficients’ table. The first

happiness = 0.20 + 0.71*income ± 0.018

The number in the table (0.713) tells us

Results of the Model:

The t value column displays the test

The Pr(>| t |) column shows the p value.

Because the p value is so low (p < 0.001),

The most important assumption

A few characteristics of a good

• It has a high density of

Sample representativeness: The sample is representative of the population.

Normally distributed errors: The errors are normally distributed.

No outliers: There are no outliers in the data.

In Market Mix Modeling, sales are divided into 2 components:

• The goal of simple regression

• The difference between the initial

• x1​,x2​,...,xk are the independent variables influencing sales.

You might also like

• x1,x2,...,xk are the independent variables influencing sales.