0% found this document useful (0 votes)

187 views18 pages

Advance Stats Assignment

1. The objective of the project is to build an optimal regression model to predict customer satisfaction levels based on various factors using a dataset with 100 entries and 13 variables. 2. Exploratory data analysis was conducted including checking for missing values, outliers, and variable identification. Correlations between variables were also examined. 3. Principal component analysis identified 4 key factors that explained most of the variance in the data, which were then used in a multiple linear regression model to predict customer satisfaction levels.

Uploaded by

Satyam Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views18 pages

Advance Stats Assignment

Uploaded by

Satyam Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Project - Advanced Statistics

Regression Model
1. Project Objective:
The objective of the project is to use the dataset 'Factor-Hair-Revised.csv' to build an optimum
regression model to predict satisfaction levels associated with different factors.This exploration
report will consists of the following:

>Importing the dataset in R

>Run an analysis on the data, Check distribution patterns
>Graphical exploration
>Understanding the structure of dataset
>Describe statistics

2 Data Analysis – A step by step data exploration consists of the following steps:

1. Environment Set up and Data Import

2. Variable Identification
3. Segregate Data
4. Graphic Analysis
5. Perform exploratory data analysis
6. Run an analysis on the data, Check distribution patterns
7. Graphical exploration
8. Find multicollinearity and showcase analysis
9. Perform simple linear regression for the dependent variable with independent variable
10. Perform PCA/Factor analysis by extracting 4 factors
11. Interpret the output and name the Factors
12. Perform Multiple linear regression with customer satisfaction as dependent variables and
the four factors as independent variables
13. Comment on the Model output and validity. Your remarks should make it meaningful for
everybody

Feature Exploration
Environment Set up and Data Import
## Set working directory
setwd("C:/Users/satyam.sharma/Desktop/R programming/Advance stats")

install package for doing the analysis

> install.packages("readr")install.packages("nortest")
install.packages("foreign")
install.packages("MASS")
install.packages("lattice")
install.packages("corrplot")
install.packages("nFactors")
install.packages("psych")

Import data set

Hair= read.csv("Factor-Hair-Revised.csv", header = TRUE,)

Open library for reading the csv files

>library(readr)

Know dimension of the data

dim(Hair)
[1] 100 13

We find out that there are 13 variables in the data with 100 entries

Check class of the dataset

class(Hair)
[1] "data.frame"

We found out that the data is in correct dataframe format and is fit for doing analysis

Variable identification
### check structure of the data
str(Hair)
'data.frame': 100 obs. of 13 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 …

Only first column ID is in integer format otherwise all other entries are in numerical format.
The satisfaction numbers are scores from 1 to 10.

We found out that there are 13 variables all in total.

Since ID is just the serial number we can remove the ID from the data

## Now we have to find out that wheather there are any missing values in the dataset

any(is.na(Data1Hair))
[1] FALSE

The answer is that there is no missing value in the data and the data is fit for the analysis.

Use data explorer library to find out the missing value

library("DataExplorer")

Plotting of Dataset to know the missing value

plot_intro(Data1Hair)

This also confirms that there are no missing values

> ## Now check for the outlier values by initial doing data summarization

> summary(Data1Hair)
ID ProdQual Ecom TechSup CompRes Advertising
Min. : 1.00 Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900
1st Qu.: 25.75 1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175
Median : 50.50 Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median
:4.000
Mean : 50.50 Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010
3rd Qu.: 75.25 3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800
Max. :100.00 Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800 Max. :6.500
ProdLine SalesFImage ComPricing WartyClaim OrdBilling DelSpeed
Min. :2.300 Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000 Min. :1.600
1st Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700 1st Qu.:3.400
Median :5.750 Median :4.900 Median :7.100 Median :6.100 Median :4.400 Median :3.900
Mean :5.805 Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278 Mean :3.886
3rd Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800 3rd Qu.:4.425
Max. :8.400 Max. :8.200 Max. :9.900 Max. :8.100 Max. :6.700 Max. :5.500
Satisfaction
Min. :4.700
1st Qu.:6.000
Median :7.050
Mean :6.918
3rd Qu.:7.625
Max. :9.900

At the initial looking there is no visible outliers in the data.

Let’s do detailed analysis
##using plot density to see the normal distribution
> plot_density(Data1Hair)

Density plots reveal some are left skewed like Delivery Speed and Tech support to some extent
while Sales Force Image are right skewed.

But most entries show a normal distribution.

###Using box plot
boxplot(Data1Hair)

In box plots we can see there are some outliers in Ecommerce, Sales Image and order billing.

### Now we move onto the factor analysis and find correlations in the data.
## For doing correlation analysis we have to remove the dependent variable Satisfaction

Haircor = Data1Hair[,1:11]
> dim(Haircor)
[1] 100 11

The last column satisfaction has been successfully removed from the data.

Hair_correlationdata = cor(Haircor)
print(Hair_correlationdata,digits = 3)
We have limit the correlation data up to 3 decimal places for better visualisation and easy
analysis

Let’s also do correlation plotting to get the bigger picture.

corrplot(Hair_correlationdata, method = "number")

corrplot(Hair_correlationdata, method = "shade")

From both these graphs we can see highcorrelation between different variables like Ecom has
corelation with sales Image; Complaint resolation has correlation with delivery speed; Order
billing has high correlation with Complaint resolution and delivery speed etc.

2. Find out Multicollinearity through linear regression :

After knowing the correlations now we have to check the multicollinearity before doing the PCA
or factor analysis.

For finding the multicollinearities, we will use Variance Inflation Factors (VIF) concept. Any value
above 4 (Hair et al., 2010) will suggest that there are multicollinearity among the variables.

Multicolinear = lm(Satisfaction ~ . , data = Data1Hair)

> print (vif (Multicolinear), digits = 4)

Ans: We can clearly see multicolinearity of our dependent variable Satisfcation with Delivery
speed that has the VIF value of 6.516 (greater than 4) and Complaint Resolution (4.730)
presence of multicollinearity which can affect our Regression model.

3. Perform simple linear regression for the dependent variable Satisfaction with every
independent variable:

lm.ProdQual = lm(Satisfaction ~ ProdQual, hair)

lm.ProdQual

So, we get a regression model: Satisfaction = 3.6759 + 0.4151 * Product Quality

The intercept coefficient for the Product Quality is 3.6759
The coefficient of Product quality is 0.4151
Thus for any one unit change in Product Quality, Satisfaction rating would improve by
0.4151 keeping other things constant as explained by model

lm.ecom = lm(Satisfaction ~ Ecom, Data1Hair)

Lm.ecom

Ecom regression model: Satisfaction = 5.1516 + 0.4811 * Ecommerce

lm.TechSup = lm(Satisfaction ~ TechSup, Data1Hair)
lm.TechSup

Tech Support regression model: Satisfaction = 6.44757 + 0.08768 * TechSup

lm.CompRes = lm(Satisfaction ~ CompRes, Data1Hair)

lm.CompRes

ComRes regression model: Satisfaction = 3.680 + 0.595 * ComRes

lm.Advertising = lm(Satisfaction ~ Advertising, Data1Hair)

lm.Advertising

Advertising regression model: Satisfaction = 5.6259 + 0.3222 * Advertising

lm.ProdLine = lm(Satisfaction ~ ProdLine, Data1Hair)
lm.ProdLine

Product line regression model: Satisfaction = 4.0220 + 0.4989 * Prodline

lm.SalesFImage = lm(Satisfaction ~ SalesFImage, Data1Hair)

lm.SalesFImage

Sales Image regression model: Satisfaction = 4.070 + 0.556 * Sales Image

lm.ComPricing = lm(Satisfaction ~ ComPricing, Data1Hair)

lm.ComPricing

ComPricing regression model: Satisfaction = 8.0386 + (-0.1607) * Comprising

lm.WartyClaim = lm(Satisfaction ~ WartyClaim, Data1Hair)

lm.WartyClaim
Wart Claim regression model: Satisfaction = 5.3581 + 0.2581 * Wartclaim

lm.OrdBilling = lm(Satisfaction ~ OrdBilling, Data1Hair)

lm.OrdBilling

OrdBilling regression model: Satisfaction = 4.0541 + 0.6695 * Ordbilling

lm.DelSpeed = lm(Satisfaction ~ DelSpeed, Data1Hair)

lm.DelSpeed

DelSpeed regression model: Satisfaction = 3.2791 + 0.9364 * DelSpeed

4. PCA:
For doing PCA we have to first conduct bartlett sphericity test to check whether Principal
Component Analysis can be done. If the test value is higher than alpha that means we can’t
conduct the PCA on the data.

cortest.bartlett(Hair_correlationdata, nrow(Hair))
The p value of 0.693724e-96 is less than the significance level of alpha = 0.001 so we can reject
the null hypothesis (that PCA cannot be conducted)

##To conduct Factor Analysis we have to do find out Eigen values

ev = eigen (cor(Data1Hair))
ev
print(ev, digits = 4)

Eigenvalue = ev$values
print(Eigenvalue, digits = 4)

factor= c(1:12)
factor
scree = data.frame(factor,Eigenvalue)
scree
plot(scree,main="Scree Plot", col ="Blue", ylim = c(0,5))
lines(scree,col="Red")
Here we can see 4 values before the elbow or greater than 1. It means that we can use 4
variables for doing the factor analysis.

Principal Components Analysis

principal(r = Hair_correlationdata, nfactors = 4, rotate = "varimax")

Standardized loadings (pattern matrix) based upon correlation matrix

RC1 RC2 RC3 RC4 h2 u2 com
ProdQual 0.00 -0.01 -0.03 0.88 0.77 0.232 1.0
Ecom 0.06 0.87 0.05 -0.12 0.78 0.223 1.1
TechSup 0.02 -0.02 0.94 0.10 0.89 0.107 1.0
CompRes 0.93 0.12 0.05 0.09 0.88 0.119 1.1
Advertising 0.14 0.74 -0.08 0.02 0.58 0.424 1.1
ProdLine 0.59 -0.06 0.15 0.64 0.79 0.213 2.1
SalesFImage 0.13 0.90 0.08 -0.16 0.86 0.140 1.1
ComPricing -0.09 0.23 -0.25 -0.72 0.64 0.360 1.5
WartyClaim 0.11 0.05 0.93 0.10 0.89 0.108 1.1
OrdBilling 0.86 0.11 0.08 0.04 0.77 0.234 1.1
DelSpeed 0.94 0.18 -0.01 0.05 0.91 0.086 1.1

RC1 RC2 RC3 RC4

SS loadings 2.89 2.23 1.86 1.77
Proportion Var 0.26 0.20 0.17 0.16
Cumulative Var 0.26 0.47 0.63 0.80
Proportion Explained 0.33 0.26 0.21 0.20
Cumulative Proportion 0.33 0.59 0.80 1.00
Mean item complexity = 1.2
Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is 0.06

Fit based upon off diagonal values = 0.97

It confirms the scree plot finding that we can use 4 components to conduct the PCA. The root
mean square of the residuals (RMSR) is very less 0.06.

Further the 4 RCs figures explain 80% of cumulative variation.

This becomes more clearer in the diagram.

fa.diagram(PCA)
Now we know what component contains what variables
## For Factor analysis we have to get scores for components
scores = round(PCA$score, 2)

Based on this we can name our four factors

Buying Experience: Complaint resolution, Order and Billing and delivery speed
Branding: E-comm, Sales team performance, Advertising
After Sales Support: Technical support, and Warranty and claims
Quality of Product: Varieties and types, prices its quality and all tangible aspects

We will create a new data set of these scores to give names

as.data.frame(scores)

colnames(scores) = c("Experience ", "Brand ", "ASales ", "Quality")

print(head(scores))
Experience Brand ASales Quality
[1,] 0.13 0.77 -1.88 0.37
[2,] 1.22 -1.65 -0.61 0.81
[3,] 0.62 0.58 0.00 1.57
[4,] -0.84 -0.27 1.27 -1.25
[5,] -0.32 -0.83 -0.01 0.45
[6,] -0.65 -1.07 -1.30 -1.05

Before performing Multiple linear regression we once again combine the satisfaction figures into
out new data frame and name the file as hair new

hair_new = cbind(hair_s, scores)

print(head(hair_new))

## Perform Multiple linear regression with customer satisfaction as dependent variables

m.linear.Model = lm(Satisfaction ~ Experience + Brand + ASales + Quality, hair_new)

summary(m.linear.RegModel)

Call:
lm(formula = Satisfaction ~ Experience + Brand + ASales + Quality,
data = hair_new)

Residuals:
Min 1Q Median 3Q Max
# -1.6346 -0.5021 0.1368 0.4617 1.5235

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91813 0.07087 97.617 < 2e-16 ***
Experience 0.61799 0.07122 8.677 1.11e-13 ***
Brand 0.50994 0.07123 7.159 1.71e-10 ***
ASales 0.06686 0.07120 0.939 0.35
Quality 0.54014 0.07124 7.582 2.27e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7087 on 95 degrees of freedom

Multiple R-squared: 0.6607, Adjusted R-squared: 0.6464
F-statistic: 46.25 on 4 and 95 DF, p-value: < 2.2e-16

Final Analysis:

Coefficient values like Intercept are significant, so it can be said that it is affecting our regression
model.

Similarly, predicted variables like Experience, Branding, After Sales and Product Quality have
significant betas implying that Response variable Satisfaction is associated with them.

After sales service is the only variable which has some high p-value implying that its beta
coefficient may not be contributing that significantly to the model or may be zero.

Overall p-value (extremely less e raise to minus 16) of Model given by F-statistic gives evidence
against the null-hypothesis. Model is significantly valid at this point

Interpretation from the data:

Satisfaction levels of the customers depends largely on the buying experience of the consumer
for the company should make all effort to improve the customer buying experience. They should
concentrate on quick delivery, billing, solve customer issues quickly and make products more
consumer friendly.

Apart from customer service, company should equally give attention of its brand visibility and its
recognition. Our model suggests that the advertisement plays a big role in that.

Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
Factor Hair Revised PDF
100% (1)
Factor Hair Revised PDF
25 pages
Data Analysis for Marketing Experts
100% (2)
Data Analysis for Marketing Experts
24 pages
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
No ratings yet
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
37 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
Pratik Zanke Source Codes
No ratings yet
Pratik Zanke Source Codes
20 pages
B) " Data - Frame".: Outliers and Missing Values (8 Marks) Answer
100% (1)
B) " Data - Frame".: Outliers and Missing Values (8 Marks) Answer
23 pages
Ash Hair Salon DM-word
No ratings yet
Ash Hair Salon DM-word
6 pages
Advanced Statistics-Project
No ratings yet
Advanced Statistics-Project
16 pages
Pratik Zanke Factor Hair Revised
No ratings yet
Pratik Zanke Factor Hair Revised
37 pages
Data Science Project Analysis
No ratings yet
Data Science Project Analysis
21 pages
Market Segmentation Statistics Project
100% (5)
Market Segmentation Statistics Project
14 pages
Project 2 Factor Hair Revised Case Study
No ratings yet
Project 2 Factor Hair Revised Case Study
25 pages
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
No ratings yet
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
17 pages
Hair Salon PCA & Regression Analysis
33% (3)
Hair Salon PCA & Regression Analysis
11 pages
Coding Project Hair
No ratings yet
Coding Project Hair
8 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Factor Hair Revised Project Report PDF
No ratings yet
Factor Hair Revised Project Report PDF
23 pages
Hair Salon Market Segmentation Analysis
No ratings yet
Hair Salon Market Segmentation Analysis
14 pages
PCA Business Report - Part 1
No ratings yet
PCA Business Report - Part 1
31 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
12 pages
Data Preprocessing and Cleaning Techniques
No ratings yet
Data Preprocessing and Cleaning Techniques
16 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Project - 2 (Factor-Hair-Revised) - Solution - Amit Tawade Nov10
100% (5)
Project - 2 (Factor-Hair-Revised) - Solution - Amit Tawade Nov10
15 pages
Project - 2 (Factor-Hair-Revised) - Solution - Amit Tawade Nov10 PDF
100% (2)
Project - 2 (Factor-Hair-Revised) - Solution - Amit Tawade Nov10 PDF
15 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
Monika Sree 08-06-2024
No ratings yet
Monika Sree 08-06-2024
36 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Part 5
No ratings yet
Part 5
22 pages
R Programming
No ratings yet
R Programming
11 pages
MiniProject3-AdvancedStat SaurabhMudgal
No ratings yet
MiniProject3-AdvancedStat SaurabhMudgal
22 pages
Module 3
No ratings yet
Module 3
108 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
PCA and Clustering Analysis Guide
No ratings yet
PCA and Clustering Analysis Guide
20 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
Assignment 2 297
No ratings yet
Assignment 2 297
6 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
OUTPUT
No ratings yet
OUTPUT
42 pages
Business Report PM Suchita Bhovar March 10 2024
No ratings yet
Business Report PM Suchita Bhovar March 10 2024
27 pages
Data Mining Project DSBA PCA Report Final
No ratings yet
Data Mining Project DSBA PCA Report Final
21 pages
Advanced Statistics Project
17% (6)
Advanced Statistics Project
2 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
(Mba-Ft - Year-Ii) Data Analysis Group Assignment: Submitted To: Prof. Chetan Jhaveri Date of Submission: 25 July, 2019
No ratings yet
(Mba-Ft - Year-Ii) Data Analysis Group Assignment: Submitted To: Prof. Chetan Jhaveri Date of Submission: 25 July, 2019
10 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
Abinash Nag Project Report CART
No ratings yet
Abinash Nag Project Report CART
40 pages
Project 2 - Ashwini Krishnan - Factor Analysis
No ratings yet
Project 2 - Ashwini Krishnan - Factor Analysis
20 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Data Science Slides
No ratings yet
Data Science Slides
57 pages
MANUAL IG - RS20 - RS30 - RS40 - Managed - 14 - 1209 - en
No ratings yet
MANUAL IG - RS20 - RS30 - RS40 - Managed - 14 - 1209 - en
62 pages
Vedic Chart Insights
No ratings yet
Vedic Chart Insights
12 pages
Task 2
No ratings yet
Task 2
31 pages
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
No ratings yet
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
10 pages
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
No ratings yet
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
10 pages
Assignment 2023
No ratings yet
Assignment 2023
5 pages
Greenbook Toolkitguide170707 PDF
No ratings yet
Greenbook Toolkitguide170707 PDF
130 pages
Namma Kalvi 12th Commerce Book Inside One Mark Study Material EM 220550
No ratings yet
Namma Kalvi 12th Commerce Book Inside One Mark Study Material EM 220550
145 pages
Error Detection and Correction in Communication Systems Project
50% (2)
Error Detection and Correction in Communication Systems Project
16 pages
NSO Level 2 Class 7 Science Paper 2017 18 Part 1
No ratings yet
NSO Level 2 Class 7 Science Paper 2017 18 Part 1
5 pages
Unit-1 Feature Point of View Types of Os
No ratings yet
Unit-1 Feature Point of View Types of Os
5 pages
LPL Financial Branch Offices
No ratings yet
LPL Financial Branch Offices
14 pages
JAVA Chapter 4
No ratings yet
JAVA Chapter 4
1 page
Study Guide QA1
No ratings yet
Study Guide QA1
3 pages
Unfolding Meaning From Memories: An Integrative Meaning Reconstruction Method For Counseling The Bereaved
No ratings yet
Unfolding Meaning From Memories: An Integrative Meaning Reconstruction Method For Counseling The Bereaved
17 pages
OPENMARK 4000 Brochure-Re
No ratings yet
OPENMARK 4000 Brochure-Re
4 pages
1375 2013
0% (1)
1375 2013
10 pages
Efficient Market Hypothesis in The Indian Stock Market: January 2020
No ratings yet
Efficient Market Hypothesis in The Indian Stock Market: January 2020
11 pages
4pm1 02r Rms 20230302
No ratings yet
4pm1 02r Rms 20230302
29 pages
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
No ratings yet
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
13 pages
LP 2
No ratings yet
LP 2
4 pages
Science Lesson Plan 5
No ratings yet
Science Lesson Plan 5
2 pages
English 7 Curriuculum Map Quarter 1-3
100% (4)
English 7 Curriuculum Map Quarter 1-3
15 pages
Happiness Is Not Something Ready Made
100% (1)
Happiness Is Not Something Ready Made
11 pages
Endemism: Definition, Types, and Examples
No ratings yet
Endemism: Definition, Types, and Examples
39 pages
Eapp Survey Report Finals
No ratings yet
Eapp Survey Report Finals
13 pages
Action Research in Education Innovation
No ratings yet
Action Research in Education Innovation
80 pages
CEL 2106 - Material 3
No ratings yet
CEL 2106 - Material 3
12 pages
Mind, Language and Society Philosophy in The Real World
No ratings yet
Mind, Language and Society Philosophy in The Real World
189 pages
Analysis of Soil Sample
100% (1)
Analysis of Soil Sample
20 pages

Advance Stats Assignment

Uploaded by

Advance Stats Assignment

Uploaded by

Project - Advanced Statistics

>Importing the dataset in R

1. Environment Set up and Data Import

install package for doing the analysis

Import data set

Open library for reading the csv files

Know dimension of the data

Check class of the dataset

We found out that there are 13 variables all in total.

Use data explorer library to find out the missing value

Plotting of Dataset to know the missing value

This also confirms that there are no missing values

At the initial looking there is no visible outliers in the data.

But most entries show a normal distribution.

Let’s also do correlation plotting to get the bigger picture.

corrplot(Hair_correlationdata, method = "number")

2. Find out Multicollinearity through linear regression :

Multicolinear = lm(Satisfaction ~ . , data = Data1Hair)

lm.ProdQual = lm(Satisfaction ~ ProdQual, hair)

So, we get a regression model: ​Satisfaction = 3.6759 + 0.4151 * Product Quality

lm.ecom = lm(Satisfaction ~ Ecom, Data1Hair)

Ecom regression model: ​Satisfaction = 5.1516 + 0.4811 * Ecommerce

Tech Support regression model: ​Satisfaction = 6.44757 + 0.08768 * TechSup

lm.CompRes = lm(Satisfaction ~ CompRes, Data1Hair)

ComRes regression model: ​Satisfaction = 3.680 + 0.595 * ComRes

lm.Advertising = lm(Satisfaction ~ Advertising, Data1Hair)

Advertising regression model: ​Satisfaction = 5.6259 + 0.3222 * Advertising

Product line regression model: ​Satisfaction = 4.0220 + 0.4989 * Prodline

lm.SalesFImage = lm(Satisfaction ~ SalesFImage, Data1Hair)

Sales Image regression model: ​Satisfaction = 4.070 + 0.556 * Sales Image

lm.ComPricing = lm(Satisfaction ~ ComPricing, Data1Hair)

ComPricing regression model: ​Satisfaction = 8.0386 + (-0.1607) * Comprising

lm.WartyClaim = lm(Satisfaction ~ WartyClaim, Data1Hair)

lm.OrdBilling = lm(Satisfaction ~ OrdBilling, Data1Hair)

OrdBilling regression model: ​Satisfaction = 4.0541 + 0.6695 * Ordbilling

lm.DelSpeed = lm(Satisfaction ~ DelSpeed, Data1Hair)

DelSpeed regression model: ​Satisfaction = 3.2791 + 0.9364 * DelSpeed

##To conduct Factor Analysis we have to do find out Eigen values

Principal Components Analysis

Standardized loadings (pattern matrix) based upon correlation matrix

RC1 RC2 RC3 RC4

The root mean square of the residuals (RMSR) is 0.06

Fit based upon off diagonal values = 0.97

Further the 4 RCs figures explain 80% of cumulative variation.

This becomes more clearer in the diagram.

Based on this we can name our four factors

We will create a new data set of these scores to give names

colnames(scores) = c("Experience ", "Brand ", "ASales ", "Quality")

hair_new = cbind(hair_s, scores)

## Perform Multiple linear regression with customer satisfaction as dependent variables

m.linear.Model = lm(Satisfaction ~ Experience + Brand + ASales + Quality, hair_new)

Residual standard error: 0.7087 on 95 degrees of freedom

Interpretation from the data:

You might also like

So, we get a regression model: Satisfaction = 3.6759 + 0.4151 * Product Quality

Ecom regression model: Satisfaction = 5.1516 + 0.4811 * Ecommerce

Tech Support regression model: Satisfaction = 6.44757 + 0.08768 * TechSup

ComRes regression model: Satisfaction = 3.680 + 0.595 * ComRes

Advertising regression model: Satisfaction = 5.6259 + 0.3222 * Advertising

Product line regression model: Satisfaction = 4.0220 + 0.4989 * Prodline

Sales Image regression model: Satisfaction = 4.070 + 0.556 * Sales Image

ComPricing regression model: Satisfaction = 8.0386 + (-0.1607) * Comprising

OrdBilling regression model: Satisfaction = 4.0541 + 0.6695 * Ordbilling

DelSpeed regression model: Satisfaction = 3.2791 + 0.9364 * DelSpeed