MLBA Assignment-Anusree Balakrishnan - BD20011 Assignment 1: Data Understanding

The document discusses analyzing customer data from an online retailer using clustering techniques. It performs the following steps: 1. Reads and cleans the customer data, which includes purchase details, invoices, and customer information. 2. Normalizes the data and determines the optimal number of clusters is 4 using the elbow method on K-means clustering. 3. Clusters the customers into 4 segments based on purchase frequency and revenue. 4. Provides recommendations to the retailer on which customer segments to target in order to generate more revenue.

Uploaded by

anu balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views12 pages

MLBA Assignment-Anusree Balakrishnan - BD20011 Assignment 1: Data Understanding

Uploaded by

anu balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

MLBA Assignment-Anusree Balakrishnan_BD20011

Assignment 1
The Consumer Complaint Database contains complaints that CFPB (Consumer Financial Protection
Bureau) has received about consumer financial products and services. The data is related to customer
complaints about financial products and services of a leading North American Bank. The goal is to
predict if the bank disputes the allegations contained in the complaint.

The source of the data is(https://catalog.data.gov/dataset/consumer-complaint-database)

Data Understanding

The goal for us is to predict whether the Band disputes the allegation contained in the complaint. To
understand that we will be using random forest algorithm. The parameters that we have used for
prediction is that:

 complaint_what_happened
 company_public_response
 company_response

Data Preparation

1) The data is read from the fileand the fields included in the data are:

 date_received: The date the complaint was received by the CFPB.

 product: The sort of product mentioned in the complaint by the customer.
 sub_product: The type of sub-product mentioned in the complaint by the customer.
 issue: The problem that the customer brought up in their complaint.
 sub_issue: The complaint's sub-issue as identified by the consumer.
 complaint_what_happened: The consumer complaint narrative is a statement of "what
happened" in the complaint submitted by the consumer. To share their story, customers must
first opt-in. We will not publish the story unless the customer gives his or her permission, and
customers can opt out at any time. The Consumer Financial Protection Bureau (CFPB) takes
reasonable steps to remove personal information from each complaint that could be used to
identify the complainant.
 company_public_response : An optional public-facing reaction to a customer complaint.
Companies can choose from a pre-determined list of responses that will be published on the
public database.
 company: The complaint is about this company.
 state: The state in which the consumer's mailing address is located.
 zip_code: The consumer's ZIP code for mailing purposes.
 consumer_consent_provided: Determines whether the customer agreed to have their complaint
storey published. We don't share the story unless the customer agrees, and customers can opt
out at any moment.
 submitted_via: How the complaint was submitted to the CFPB.
 date_sent_to_company: The date the CFPB sent the complaint to the company.
 company_response: This is how the company responded and handled the situation
 timely: Whether or whether the company responded in a timely manner.
 consumer_disputed: Whether or not the customer had a problem with the company's response
 complaint_id: The unique identification number for a complaint.

2) Before Analysis all the null values have to be removed. We will first remove null values from
complaint_what_happened , and will then check if there is any other null values in the data.

3) Also, since we will be analyzing with company response, complaint_what_happened,company_public

responses, we will be considering only those values.

4) After this we will be mapping company_public_responses to Dispute or Agrees. We have done the
following mapping
5)After that we distributed these companies public responses

6) complaint_what_happened is renamed to complaint, to make the data easy

7) We will be performing a clean up to remove inwanted characters, email addres, to make the data
more meaningful.
8) In order to predict the accuracy of the model we will be splitting the complaints data for training and
testing in the ration of 70:30.

9) We will be using rand forest classifier here

Output

After using random forest classifier, we got the following output, and our accuracy of the model turns
out to be 97%.

Code
Assignment 2
The following dataset is from an online retailer that wants to perform data mining techniques for
customer-centric business intelligence. The online retailer considered here is a typical one: a small
business and a relatively new entrant to the online retail sector, knowing the growing importance of
being analytical in today’s online businesses and data mining techniques, however, lacking technical
awareness and recourses. This analysis aims to help the retailer better understand its customers and
therefore conduct customer-centric marketing more effectively. Your job is to cluster the customers
from their purchase behaviours using a suitable data mining technique and understand the properties of
each cluster. You also provide a set of recommendations that will help the online retailer company.

Source of data: https://archive.ics.uci.edu/ml/machine-learning-databases/00502/

Data Understanding

The following dataset is from an online retailer that wants to perform data mining techniques for
customer-centric business intelligence. The data has variables such as invoice date, purchase quantity,
customer details, country of purchase and so on.

Data Preparation

1)First we read the excel file and merged both sheets and converted into a single data.

2)First we removed the duplicate entries from Invoice Data, and then we have converted the invoice
data to proper date time format.

3)We dropped the null entries from the data frame.

4) We removed price which were negative. If Quantity was less than 0 we removed them also. We
removed invalid stock codes also.

5) We plotted various graph for analyzing the data file, which can be seen in the code when given.

6) We normalized the data for performing Kmeans.

7) We did Kmeans to find the optimum value of K, to find the optimum value of K. The value lg K=4

Output

We classified different customers based on their frequency of purchase and the revenue generated.

· class 0 Least revenue & less frequent

· class 1 low revenue and less frequent purchase
· class 2 high frequency of purchase and high revenue generated
· class 3 moderate frequency of purchase and moderate revenue generated
· class 4 high revenue generating

By the analysis we got to know which segment the retailer should consider generating more revenue

Code

I will be adding the code externally and sharing them as a document.

Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Project Presentation
No ratings yet
Project Presentation
15 pages
Edaaaa
No ratings yet
Edaaaa
47 pages
Capstone Project 1 1
33% (3)
Capstone Project 1 1
4 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
SigmaDAInduction25 Analytics Task 1
No ratings yet
SigmaDAInduction25 Analytics Task 1
5 pages
Data Analysis Challenge for Experts
No ratings yet
Data Analysis Challenge for Experts
5 pages
Part 1
No ratings yet
Part 1
3 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Data Mining
No ratings yet
Data Mining
10 pages
Aspect Based Sentiment Analysis: 14 April 2021
No ratings yet
Aspect Based Sentiment Analysis: 14 April 2021
6 pages
R CASE STUDY 1 (Retail)
No ratings yet
R CASE STUDY 1 (Retail)
4 pages
Task-by-Task Guide - Retail Data Analysis
No ratings yet
Task-by-Task Guide - Retail Data Analysis
6 pages
DSML - Project Report - Group 3
No ratings yet
DSML - Project Report - Group 3
17 pages
Target SQL - Reference
No ratings yet
Target SQL - Reference
11 pages
Data Cleansing Assignment Guide
No ratings yet
Data Cleansing Assignment Guide
4 pages
Kaviya V Phase1 Report
No ratings yet
Kaviya V Phase1 Report
3 pages
Text
No ratings yet
Text
3 pages
Data Analysis and Management For Retail Transactions
No ratings yet
Data Analysis and Management For Retail Transactions
8 pages
Unit 3-5 15 Marks
No ratings yet
Unit 3-5 15 Marks
8 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Case Study-Text Mining Bank Review/Comlaint'S Analysis
No ratings yet
Case Study-Text Mining Bank Review/Comlaint'S Analysis
2 pages
Nile Ivision
No ratings yet
Nile Ivision
3 pages
Tushar Case Study
No ratings yet
Tushar Case Study
6 pages
SQQP3123 A212 Assignment 2 - Data Mining
No ratings yet
SQQP3123 A212 Assignment 2 - Data Mining
4 pages
Inthiyas Phase2 PRJ
No ratings yet
Inthiyas Phase2 PRJ
8 pages
Rithika
No ratings yet
Rithika
16 pages
Project Questions
No ratings yet
Project Questions
4 pages
Case Study-2 - Online Retail Data Pre-Processing
No ratings yet
Case Study-2 - Online Retail Data Pre-Processing
2 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
CRM - Part 3 - Analytical CRM - Chap 7
No ratings yet
CRM - Part 3 - Analytical CRM - Chap 7
36 pages
Ads Phase3
No ratings yet
Ads Phase3
9 pages
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
No ratings yet
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
10 pages
SQL Brief
No ratings yet
SQL Brief
6 pages
Bank Marketing ML Project
No ratings yet
Bank Marketing ML Project
5 pages
Module 3.2
No ratings yet
Module 3.2
7 pages
BA Module 3 - As of 25th September 2020
No ratings yet
BA Module 3 - As of 25th September 2020
72 pages
Social Media Marketing Campaign Analysis
No ratings yet
Social Media Marketing Campaign Analysis
10 pages
Retail Data Insights for Retailers
No ratings yet
Retail Data Insights for Retailers
25 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
IIT FDS Assignment 1 Likhita
No ratings yet
IIT FDS Assignment 1 Likhita
7 pages
Description: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Description: Bank - Marketing - Part1 - Data - CSV
4 pages
BusinessCaseStudyTargetMySQL v1
No ratings yet
BusinessCaseStudyTargetMySQL v1
31 pages
BAC 223 A1 Task Sheet
No ratings yet
BAC 223 A1 Task Sheet
5 pages
Intro To Data Analytics Activity Templates
No ratings yet
Intro To Data Analytics Activity Templates
12 pages
Data Science Project Overview
No ratings yet
Data Science Project Overview
8 pages
Consumer Complaint Analysis - Debershi Analysis
No ratings yet
Consumer Complaint Analysis - Debershi Analysis
46 pages
Sprocket Analysis Powerpoint
No ratings yet
Sprocket Analysis Powerpoint
77 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Project List Data Analytics
100% (1)
Project List Data Analytics
13 pages
Phase-1 Report
No ratings yet
Phase-1 Report
4 pages
Business Situation Solutions
No ratings yet
Business Situation Solutions
4 pages
LDA Topic Modeling for CFPB Complaints
No ratings yet
LDA Topic Modeling for CFPB Complaints
16 pages
DS - Course 2 Project
No ratings yet
DS - Course 2 Project
9 pages
Data Science Internship Insights
No ratings yet
Data Science Internship Insights
34 pages
Data Science: Customer Satisfaction Prediction
No ratings yet
Data Science: Customer Satisfaction Prediction
35 pages
Group21 ProjectCharter
100% (1)
Group21 ProjectCharter
3 pages
Lecture - 3: Elasticity: Abdul Quadir Xlri
No ratings yet
Lecture - 3: Elasticity: Abdul Quadir Xlri
38 pages
PJM MidTerm
No ratings yet
PJM MidTerm
5 pages
Lecture - 4: Production: Abdul Quadir Xlri
No ratings yet
Lecture - 4: Production: Abdul Quadir Xlri
24 pages
PJM Simulation: Scenario A Score: 733/1000
No ratings yet
PJM Simulation: Scenario A Score: 733/1000
1 page
Lecture-3: Market Equilibrium and Applications: Abdul Quadir Xlri
No ratings yet
Lecture-3: Market Equilibrium and Applications: Abdul Quadir Xlri
37 pages
TCS HRM
No ratings yet
TCS HRM
2 pages
Demand, Supply & Market Equilibrium
No ratings yet
Demand, Supply & Market Equilibrium
50 pages
Class Preparation 1
No ratings yet
Class Preparation 1
5 pages
BD20011 - Anusree Balakrishnan - MGEAssignment
No ratings yet
BD20011 - Anusree Balakrishnan - MGEAssignment
21 pages
Financial Statement Analysis
No ratings yet
Financial Statement Analysis
8 pages
Grab The Opportunity-: The Akshaya Nidhi Foundation - in Aid of Akshaya Patra
No ratings yet
Grab The Opportunity-: The Akshaya Nidhi Foundation - in Aid of Akshaya Patra
3 pages
Cash Hoarding at Infosys Anusree Balakrishnan - BD20011: Case Background
No ratings yet
Cash Hoarding at Infosys Anusree Balakrishnan - BD20011: Case Background
2 pages
Organizational Structure - TCS
50% (2)
Organizational Structure - TCS
2 pages
Solar Nanoantenna for Infrared Energy
No ratings yet
Solar Nanoantenna for Infrared Energy
1 page
Eas, Ibo0 X 307) (H) :12240: Coat WWT O
No ratings yet
Eas, Ibo0 X 307) (H) :12240: Coat WWT O
6 pages
Anusree Balakrishnan: Recognized For The Dedication and Being Responsive To Customer For FLUENTGRID Project
No ratings yet
Anusree Balakrishnan: Recognized For The Dedication and Being Responsive To Customer For FLUENTGRID Project
1 page
Solar Nanoantenna: Design and Technology For Dark Frequency
No ratings yet
Solar Nanoantenna: Design and Technology For Dark Frequency
1 page
MANAC - Chapter 6
No ratings yet
MANAC - Chapter 6
8 pages
Macroeconomic Policy
No ratings yet
Macroeconomic Policy
14 pages
SHP 2 Grid
No ratings yet
SHP 2 Grid
7 pages
Arabic Greetings for Beginners
No ratings yet
Arabic Greetings for Beginners
4 pages
LPL Financial Branch Offices
No ratings yet
LPL Financial Branch Offices
14 pages
Molylube Cam Compound L56
No ratings yet
Molylube Cam Compound L56
2 pages
Noorul Islam Centre For Higher Education Noorul Islam University, Kumaracoil M.E. Biomedical Instrumentation Curriculum & Syllabus Semester I
No ratings yet
Noorul Islam Centre For Higher Education Noorul Islam University, Kumaracoil M.E. Biomedical Instrumentation Curriculum & Syllabus Semester I
26 pages
Ar Proposal 2023-2024
No ratings yet
Ar Proposal 2023-2024
1 page
5.1 Logic Statements and Quantifiers
No ratings yet
5.1 Logic Statements and Quantifiers
16 pages
Checking Understanding
No ratings yet
Checking Understanding
9 pages
Fitness For Service Assessments BAOT144 - S
No ratings yet
Fitness For Service Assessments BAOT144 - S
10 pages
Learning Strategies and Assessment Techniques As Applied To Edukasyong Pantahanan at Pangkabuhayan/ Technology and Livelihood Education
100% (1)
Learning Strategies and Assessment Techniques As Applied To Edukasyong Pantahanan at Pangkabuhayan/ Technology and Livelihood Education
20 pages
American Options Pricing Methods
No ratings yet
American Options Pricing Methods
9 pages
Courseera's Foray Into Gen AI
No ratings yet
Courseera's Foray Into Gen AI
23 pages
Applied Economics
No ratings yet
Applied Economics
11 pages
Chapter 5 - Recovery Techniques
No ratings yet
Chapter 5 - Recovery Techniques
24 pages
2024 Assessment Handbook
No ratings yet
2024 Assessment Handbook
20 pages
Hydraulic Handpump
No ratings yet
Hydraulic Handpump
1 page
Shock Absorber Design & Analysis
No ratings yet
Shock Absorber Design & Analysis
16 pages
IOQM Counting Techniques Guide
No ratings yet
IOQM Counting Techniques Guide
4 pages
Stuudy Case
No ratings yet
Stuudy Case
8 pages
CSEC EDPM CoverSheetForESBA V02 Fillable
No ratings yet
CSEC EDPM CoverSheetForESBA V02 Fillable
1 page
LG Oem Lgit Plde-P017a SCH
No ratings yet
LG Oem Lgit Plde-P017a SCH
2 pages
Component Description For Single Signal Acquisition and Actuation Module (SSAM) Control Unit
No ratings yet
Component Description For Single Signal Acquisition and Actuation Module (SSAM) Control Unit
1 page
Succession Plan
No ratings yet
Succession Plan
9 pages
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
No ratings yet
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
12 pages
Euceg Be Negativelist 0
No ratings yet
Euceg Be Negativelist 0
56 pages
Cauchy Sequences for Math Students
No ratings yet
Cauchy Sequences for Math Students
4 pages
Humanities and Art First Session
No ratings yet
Humanities and Art First Session
31 pages
Aclara kV2c Data Sheet
No ratings yet
Aclara kV2c Data Sheet
2 pages
Presented By-Khyati, Chareeta, Hitesh
No ratings yet
Presented By-Khyati, Chareeta, Hitesh
6 pages
Atomic Structure
No ratings yet
Atomic Structure
18 pages