0% found this document useful (0 votes)

21 views16 pages

Part A

The document discusses analyzing company financial data to predict the likelihood of default. It describes preprocessing steps like outlier treatment and handling missing values. Univariate and bivariate analyses are performed to understand variable distributions and relationships. The data is split into train and test sets. A logistic regression model is built and evaluated using metrics like precision, recall, F1 score and accuracy. The model performs well on the majority class of no default but poorly on the minority default class.

Uploaded by

Saumya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views16 pages

Part A

Uploaded by

Saumya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction

 In the business world, companies need to pay back the money they owe, or they could get
into trouble. This trouble is called "default," and it makes it hard for a company to get loans
in the future. It can also make them pay more interest on the money they owe. Investors,
who put money into companies, like to invest in companies that can handle their money
well, grow quickly, and handle bigger sizes.
 To understand if a company is doing well with its money, we look at something called a
"balance sheet." This sheet shows what a company owns, what it owes, and how much its
owners have invested. This report talks about the problem of companies struggling to pay
their debts, which can lead to bigger problems. We will also see how this affects both the
company's ability to grow and the choices investors make.
 By using information from the company's financial statements from the previous year, we
will explore how companies can avoid financial trouble and make smart choices for growth.

Outlier Treatment
 It's notable that almost all continuous variables contain outliers. Outliers are data points
that significantly deviate from the general pattern of the data. They are situated far away
from the bulk of the data and can have a substantial impact on various statistical
measures and analyses.
 The presence of outliers and right-skewed distributions can carry several implications:

1. Influence on Central Tendency: The presence of outliers can distort measures of

central tendency, such as the mean. The mean might be pulled in the direction of the
outliers, giving a misleading impression of the average value.

2. Effect on Spread: Outliers can also inflate the measure of spread, such as the standard
deviation. This could result in an overestimation of the variability in the data.

3. Analysis Validity: When conducting statistical analyses, such as regression, the

presence of outliers can affect the assumptions of the model and the reliability of the
results. Careful consideration is required to ensure that the presence of outliers does not
lead to misleading conclusions.

4. Decision-Making Impact: Outliers can have a significant impact on decisions based on

data. They can distort insights and recommendations drawn from the analysis.

To treat the outliers, the Interquartile Range (IQR) method was implemented as an
effective approach to identify and treat outliers.

The IQR method offers a systematic way to detect outliers while taking into account the
spread of the data around its median. Here's how we applied the IQR method:

The utilization of the IQR method to treat outliers ensures that our analysis is more
robust and less susceptible to distortions caused by extreme values. By addressing
outliers in this manner, we aim to strike a balance between preserving meaningful
insights within the data and maintaining the integrity of our statistical conclusions.

Missing values
 During our analysis of the dataset, we encountered the common issue of missing values
within certain variables. Missing values can pose a challenge as they can disrupt statistical
calculations and potentially lead to biased conclusions. To address this, a well-established
technique known as mean imputation was applied.
 Mean imputation involves replacing missing values with the mean (average) value of the
observed data for the respective variable.
 Mean imputation provides a practical approach to handling missing values, especially when
the proportion of missing data is relatively small. While it may not capture the full complexity
of the missing data mechanism, it can serve as a reasonable solution that maintains the
sample size and overall data structure.
Univariate Analysis
 In our exploration of the dataset, we conducted univariate analysis to gain insights into
individual variables. Univariate analysis focuses on understanding the distribution, central
tendency, and spread of a single variable. Let's delve into the key aspects we considered
during this analysis:
 We examined the shape of the distribution to understand how values are spread across
different ranges.
 The distribution patterns of variables exhibit a combination of normal distribution and
skewness. This diverse distribution landscape provides us with a deeper understanding of the
data's underlying characteristics.
1. Normally Distributed Variables:

Some variables showcase a bell-shaped curve in their distribution, known as a normal

distribution. In a normal distribution, the data clusters around the mean, with symmetrical
tails on either side. This pattern indicates that a significant portion of the values are clustered
near the center of the distribution, while fewer values deviate towards the extremes.

The presence of normally distributed variables suggests that certain aspects of the dataset
adhere to a familiar statistical pattern. This can facilitate more straightforward statistical
analyses and comparisons.

2. Skewed Variables:

On the other hand, several variables exhibit skewness in their distribution. Skewness occurs
when the data is concentrated towards one tail of the distribution, resulting in an
asymmetrical shape. Positive skewness indicates a longer tail on the right side, while
negative skewness indicates a longer tail on the left side.

Skewed variables may be influenced by factors that lead to concentration of values in a

particular direction. Identifying and understanding the reasons behind skewness is crucial for
accurate interpretation and decision-making.

Bivariate Analysis

 During our exploration of relationships between “_Operating_Expense_Rate” and

“_Cash_Flow_Rate” variables using scatter plots, we observed instances where the
relationship didn't exhibit the typical characteristics of being skewed, linear, or normally
distributed. Instead, we encountered more intricate patterns that warrant a deeper analysis
and consideration.
 Our analysis of the categorical variable “Default” has revealed an interesting distribution that
we have depicted in a pie chart. The pie chart illustrates the relative proportions of different
categories within the variable. In this case, the distribution is characterized by a notable
disparity in the sizes of the sectors:

- “No Default” occupies approximately 89.3% of the entire pie.

- “Default” accounts for around 10.7% of the pie.

 A notable pattern emerged from this analysis, where “No default” exhibited higher values
compared to the “Default”.

 A notable pattern emerged from this analysis, where “default” exhibited higher values
compared to the “Default”.
Train-Test split
 The dataset has been split into 70:20 ratio for training and testing.

Logistic regression model

Let's break down the key metrics and their interpretations:

 Precision: Precision is the proportion of true positive predictions among all positive
predictions. It measures the model's ability to correctly identify positive cases.

- For class 0 (no default), the precision is 0.88. This indicates that among the instances the
model predicted as "no default," 88% of them were actually correct.

- For class 1 (default), the precision is 0.00. This suggests that among the instances the model
predicted as "default," none of them were actually correct.

 Recall (Sensitivity): Recall is the proportion of true positive predictions among all actual
positive cases. It measures the model's ability to capture all positive cases.

- For class 0 (no default), the recall is 1.00. This implies that the model identified all instances
of "no default" correctly.

- For class 1 (default), the recall is 0.00. This indicates that the model failed to identify any
instances of "default."

 F1-Score: The F1-score is the harmonic mean of precision and recall. It balances both metrics
and is particularly useful when dealing with imbalanced classes.

- For class 0 (no default), the F1-score is 0.94. This indicates a good balance between
precision and recall.

- For class 1 (default), the F1-score is 0.00. Since recall is zero, the F1-score is also zero for
this class.

 Support: The support is the number of actual occurrences of each class in the test dataset.
 Accuracy: Accuracy is the overall proportion of correct predictions out of all predictions
made by the model. It gives a general view of the model's performance.

- The overall accuracy of the model is 0.88, meaning that 88% of the predictions made by the
model were correct.

 Macro Average and Weighted Average: These are calculated averages of precision, recall, and
F1-score, considering both classes. Macro average gives equal weight to both classes, while
weighted average considers class proportions.

- The macro average F1-score is 0.47, indicating an overall low performance.

- The weighted average F1-score is 0.83, which considers the class imbalance and indicates a
better performance than the macro average.

 Interpretation: The classification report shows that the model performs well in predicting
class 0 (no default), achieving high precision, recall, and F1-score. However, the model
struggles significantly in predicting class 1 (default), with extremely low values for precision,
recall, and F1-score. This suggests that the model is heavily biased towards predicting class 0
and is not effectively identifying instances of class 1. Further analysis, such as balancing
classes, adjusting model parameters, or exploring different algorithms, may be necessary to
improve the model's performance.

Random Forest

 Precision: Precision is the proportion of true positive predictions among all positive
predictions. It measures the model's ability to correctly identify positive cases.

- For class 0 (no default), the precision is 0.91. This indicates that among the instances the
model predicted as "no default," 91% of them were actually correct.

- For class 1 (default), the precision is 0.79. This suggests that among the instances the model
predicted as "default," 79% of them were correct.

 Recall (Sensitivity): Recall is the proportion of true positive predictions among all actual
positive cases. It measures the model's ability to capture all positive cases.
- For class 0 (no default), the recall is 0.99. This implies that the model correctly identified
99% of the instances of "no default."

- For class 1 (default), the recall is 0.23. This indicates that the model was able to identify
only 23% of the instances of "default."

 F1-Score: The F1-score is the harmonic mean of precision and recall. It balances both metrics
and is particularly useful when dealing with imbalanced classes.

- For class 0 (no default), the F1-score is 0.95. This indicates a good balance between
precision and recall for this class.

- For class 1 (default), the F1-score is 0.35. While the F1-score for class 1 is lower, it still
indicates some degree of balance between precision and recall.

 Support: The support is the number of actual occurrences of each class in the test dataset.

 Accuracy: Accuracy is the overall proportion of correct predictions out of all predictions
made by the model. It gives a general view of the model's performance.

- The overall accuracy of the model is 0.90, meaning that 90% of the predictions made by the
model were correct.

- The macro average F1-score is 0.65, indicating an overall moderate performance.

- The weighted average F1-score is 0.88, suggesting a good overall performance, considering
the class imbalance.

 Interpretation: The classification report indicates that the Random Forest model performs
well in predicting class 0 (no default), achieving high precision, recall, and F1-score. However,
for class 1 (default), while precision is reasonable, recall is relatively low. This suggests that
the model is better at identifying instances of class 0, but it struggles to effectively identify
instances of class 1. Further analysis and potential adjustments may be needed to improve
the model's ability to predict the positive class more accurately.
LDA

 Precision: Precision is the proportion of true positive predictions among all positive
predictions. It measures the model's ability to correctly identify positive cases.

- For class 0 (no default), the precision is 0.93. This indicates that among the instances the
model predicted as "no default," 93% of them were actually correct.

- For class 1 (default), the precision is 0.67. This suggests that among the instances the model
predicted as "default," 67% of them were correct.

 Recall (Sensitivity): Recall is the proportion of true positive predictions among all actual
positive cases. It measures the model's ability to capture all positive cases.

- For class 0 (no default), the recall is 0.97. This implies that the model correctly identified
97% of the instances of "no default."

- For class 1 (default), the recall is 0.46. This indicates that the model was able to identify
only 46% of the instances of "default."

 F1-Score: The F1-score is the harmonic mean of precision and recall. It balances both metrics
and is particularly useful when dealing with imbalanced classes.

- For class 0 (no default), the F1-score is 0.95. This indicates a good balance between
precision and recall for this class.

- For class 1 (default), the F1-score is 0.54. While the F1-score for class 1 is lower, it still
suggests some degree of balance between precision and recall.

 Support: The support is the number of actual occurrences of each class in the test dataset.

 Accuracy: Accuracy is the overall proportion of correct predictions out of all predictions
made by the model. It gives a general view of the model's performance.

- The overall accuracy of the model is 0.91, meaning that 91% of the predictions made by the
model were correct.
 Macro Average and Weighted Average: These are calculated averages of precision, recall, and
F1-score, considering both classes. Macro average gives equal weight to both classes, while
weighted average considers class proportions.

- The macro average F1-score is 0.75, indicating an overall moderate performance.

- The weighted average F1-score is 0.90, suggesting a good overall performance, considering
the class imbalance.

 Interpretation: The classification report indicates that the LDA model performs well in
predicting class 0 (no default), achieving high precision, recall, and F1-score. For class 1
(default), precision and recall are moderate, indicating the model's ability to predict
instances of this class to some extent. However, there is still room for improvement,
especially in terms of recall for class 1. Further analysis and potential adjustments may be
necessary to enhance the model's performance, particularly in identifying instances of the
positive class more accurately.

Compare the performances of Logistic Regression, Random Forest, and LDA

models (include ROC curve)

ROC curves

Logistic Regression
Random Forest

LDA

Here's a side-by-side comparison of the models' performances based on the classification reports
you provided:

1. Accuracy: All three models have similar accuracy values of around 0.90, indicating that they
perform well in terms of overall correct predictions.

2. Precision and Recall for Default (Class 1): In all models, precision for the default class (class 1) is
around 0.67, indicating that when they predict defaults, they're correct around 67% of the time.
However, recall for the default class varies. Logistic Regression has the lowest recall (0.23), while
Random Forest and LDA have similar recall (0.46).
3. Precision and Recall for No Default (Class 0): Precision and recall for the no default class (class 0)
are relatively high across all models.

4. F1-Score: F1-scores for the default class are similar across the models, ranging from 0.35 to 0.54.
For the no default class, F1-scores are also comparable, ranging from 0.95 to 0.95.

5. Macro Average and Weighted Average: Macro average F1-score and weighted average F1-score are
consistent across the models.

Conclusion & Recommendations

Based on the analysis of the classification models (Logistic Regression, Random Forest, and Linear
Discriminant Analysis), here are the conclusions and recommendations:

Conclusions:

1. Model Performances: All three models demonstrated reasonable performances in terms of

accuracy, precision, and F1-scores. The models were able to predict instances of the no default class
(class 0) with high precision and recall.

2. Class Imbalance Impact: The performance for predicting instances of the default class (class 1) was
more varied. While all models achieved relatively good precision for class 1, the recall varied
significantly. This indicates that the models had difficulty identifying instances of default, especially
when class 1 instances were underrepresented in the dataset.

3. Trade-Offs: Depending on your objectives, different models may be more suitable. If correctly
identifying default instances is a priority, Random Forest or Linear Discriminant Analysis might be
preferred due to their higher recall for class 1. If a balanced trade-off between precision and recall is
important, all three models could be considered.

Recommendations:

1. Feature Engineering: Consider further exploring and engineering features that might better
differentiate instances of the default class. Feature engineering can provide models with more
discriminative information.

2. Class Balancing: Address the class imbalance issue by employing techniques such as oversampling,
undersampling, or using algorithms that handle imbalanced data better.
3. Hyperparameter Tuning: Fine-tune the hyperparameters of each model using techniques like grid
search or random search to optimize their performances.

4. Ensemble Methods: Experiment with ensemble methods, such as combining multiple models (e.g.,
stacking, boosting), to leverage their strengths and mitigate their weaknesses.

5. Business Context: Consider the practical implications of model decisions. The choice of the best
model depends on the cost of false positives and false negatives in your specific business context.

6. Continuous Improvement: Continue to monitor the model's performance as new data becomes
available. Retrain and update the model periodically to ensure it remains effective.

7. Interpretability: Depending on the industry and regulatory requirements, choose models that
provide interpretable results. Logistic Regression and Linear Discriminant Analysis are generally more
interpretable than Random Forest.

8. Documentation: Document the model-building process, feature selection, and hyperparameters

used. This documentation will be invaluable for reproducibility and future improvements.

In conclusion, while the models show promise in predicting default and no default instances,
addressing class imbalance, fine-tuning models, and considering business context are essential for
achieving better results. Keep in mind that model building is an iterative process, and continuous
refinement will lead to better predictions and insights.

DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
DS 5-Marks Semeseter Suggestion
No ratings yet
DS 5-Marks Semeseter Suggestion
56 pages
Data Screening and Cleaning Guide
No ratings yet
Data Screening and Cleaning Guide
55 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Multivariant Data.
No ratings yet
Multivariant Data.
36 pages
FRA Report
100% (1)
FRA Report
30 pages
ASM 1 Thay Duong
No ratings yet
ASM 1 Thay Duong
8 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Data Visualization & Statistical Analysis
No ratings yet
Data Visualization & Statistical Analysis
58 pages
BRM CS
No ratings yet
BRM CS
4 pages
Answers IBS
No ratings yet
Answers IBS
13 pages
Research File 3
No ratings yet
Research File 3
10 pages
Data Analysis for Business Analysts
No ratings yet
Data Analysis for Business Analysts
31 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Missing Values in A Dataset
No ratings yet
Missing Values in A Dataset
2 pages
DATA 240 - 23 - Lec3 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec3 - FA 2024 - Dist
50 pages
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
No ratings yet
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
4 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
BA Chatgpt Notes
No ratings yet
BA Chatgpt Notes
27 pages
6735367a5d6e24a5f185bf9c 99512104437
No ratings yet
6735367a5d6e24a5f185bf9c 99512104437
2 pages
India Credit Risk Model Report
No ratings yet
India Credit Risk Model Report
18 pages
Credit-Scoring-CASE
No ratings yet
Credit-Scoring-CASE
29 pages
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
No ratings yet
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
42 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Project Employee Absenteeism
No ratings yet
Project Employee Absenteeism
33 pages
Advanced Statistics and Probability
No ratings yet
Advanced Statistics and Probability
37 pages
Sullivan 2021
No ratings yet
Sullivan 2021
14 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Batengi Chapter 4 & 5
No ratings yet
Batengi Chapter 4 & 5
25 pages
Meran
No ratings yet
Meran
2 pages
L18&19 Data Exploration
No ratings yet
L18&19 Data Exploration
50 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
When The Mean Value of A Variable Is Unusually High
No ratings yet
When The Mean Value of A Variable Is Unusually High
6 pages
How To Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis On Pre-Registration
No ratings yet
How To Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis On Pre-Registration
10 pages
Employees Don't Leave The Company They Leave Their Managers
No ratings yet
Employees Don't Leave The Company They Leave Their Managers
1 page
What Is Outlier
No ratings yet
What Is Outlier
3 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
4-Regression Diagnostics SAS
No ratings yet
4-Regression Diagnostics SAS
12 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Loan Default Prediction Analysis
No ratings yet
Loan Default Prediction Analysis
18 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Midterms Notes (MMW)
No ratings yet
Midterms Notes (MMW)
8 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
13 Chapter 5
No ratings yet
13 Chapter 5
38 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
SQL Interview Questions Goldman Sachs
No ratings yet
SQL Interview Questions Goldman Sachs
19 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
7 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
Skewness Skewness: Baumann, 2023
No ratings yet
Skewness Skewness: Baumann, 2023
13 pages
The Multivariate Social Scientist Introductory Statistics Using Generalized Linear Models Sofroniou Instant Download
100% (9)
The Multivariate Social Scientist Introductory Statistics Using Generalized Linear Models Sofroniou Instant Download
71 pages
Part B
No ratings yet
Part B
8 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
DSBA Extended MRA Project Problem Statement
0% (1)
DSBA Extended MRA Project Problem Statement
2 pages
Cafe Sales R&A
No ratings yet
Cafe Sales R&A
22 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
The Shiphandlers Guide
No ratings yet
The Shiphandlers Guide
143 pages
Jeep Wrangler Brochure - Compressed
No ratings yet
Jeep Wrangler Brochure - Compressed
22 pages
Organizational Structures Guide
No ratings yet
Organizational Structures Guide
1 page
Test 1 PDF
No ratings yet
Test 1 PDF
6 pages
Format Messtechnik GMBH
No ratings yet
Format Messtechnik GMBH
44 pages
WO Albeng Alprod Depo 30
No ratings yet
WO Albeng Alprod Depo 30
3 pages
Encyclopedia of Philosophy 2nd Ed Volume 10 Appendix Additional Articles Thematic Outline Bibliographies Index 2nd Ed Edition Donald M. Borchert Instant Download
100% (6)
Encyclopedia of Philosophy 2nd Ed Volume 10 Appendix Additional Articles Thematic Outline Bibliographies Index 2nd Ed Edition Donald M. Borchert Instant Download
76 pages
Study Plan
No ratings yet
Study Plan
4 pages
Science Lesson Plan 5
No ratings yet
Science Lesson Plan 5
2 pages
Emerging Issues of Procurement
No ratings yet
Emerging Issues of Procurement
19 pages
Learning Plan in EPP 6
No ratings yet
Learning Plan in EPP 6
4 pages
Efficient Market Hypothesis in The Indian Stock Market: January 2020
No ratings yet
Efficient Market Hypothesis in The Indian Stock Market: January 2020
11 pages
MT Test 1 QP
No ratings yet
MT Test 1 QP
2 pages
Construction Blueprint Details
100% (1)
Construction Blueprint Details
2 pages
Unfolding Meaning From Memories: An Integrative Meaning Reconstruction Method For Counseling The Bereaved
No ratings yet
Unfolding Meaning From Memories: An Integrative Meaning Reconstruction Method For Counseling The Bereaved
17 pages
Compal Electronics Engineering Document
75% (4)
Compal Electronics Engineering Document
1 page
Vapor Diffusion in Air Streams
No ratings yet
Vapor Diffusion in Air Streams
8 pages
Chapter 5 - Recovery Techniques
No ratings yet
Chapter 5 - Recovery Techniques
24 pages
Aquatic Plant Presentation
No ratings yet
Aquatic Plant Presentation
17 pages
Haas Service and Operator Manual Archive
100% (1)
Haas Service and Operator Manual Archive
75 pages
9TH Mathematics Test CH 2
No ratings yet
9TH Mathematics Test CH 2
2 pages
Leg Foot Massager 1026 Manual
No ratings yet
Leg Foot Massager 1026 Manual
5 pages
USX Corporation
0% (4)
USX Corporation
13 pages
Za HL 368 Big Book Original in This Together Ver 2
No ratings yet
Za HL 368 Big Book Original in This Together Ver 2
26 pages
Courseera's Foray Into Gen AI
No ratings yet
Courseera's Foray Into Gen AI
23 pages
SManual CL001943 CLP675
No ratings yet
SManual CL001943 CLP675
173 pages
Ethical Dilemmas in Movies
No ratings yet
Ethical Dilemmas in Movies
13 pages
Overview of Timeline Panel
No ratings yet
Overview of Timeline Panel
15 pages
Exercise Solutions For Simulation With Arena PDF
0% (1)
Exercise Solutions For Simulation With Arena PDF
2 pages
Lea Strength: Instruments
No ratings yet
Lea Strength: Instruments
3 pages

Part A

Uploaded by

Part A

Uploaded by

Introduction

1. Influence on Central Tendency: The presence of outliers can distort measures of

3. Analysis Validity: When conducting statistical analyses, such as regression, the

4. Decision-Making Impact: Outliers can have a significant impact on decisions based on

Some variables showcase a bell-shaped curve in their distribution, known as a normal

Skewed variables may be influenced by factors that lead to concentration of values in a

 During our exploration of relationships between “_Operating_Expense_Rate” and

- “No Default” occupies approximately 89.3% of the entire pie.

- “Default” accounts for around 10.7% of the pie.

Logistic regression model

Let's break down the key metrics and their interpretations:

- The macro average F1-score is 0.47, indicating an overall low performance.

- The macro average F1-score is 0.65, indicating an overall moderate performance.

- The macro average F1-score is 0.75, indicating an overall moderate performance.

Compare the performances of Logistic Regression, Random Forest, and LDA

Conclusion & Recommendations

1. Model Performances: All three models demonstrated reasonable performances in terms of

8. Documentation: Document the model-building process, feature selection, and hyperparameters

You might also like