Data Mining
Data miningis the process of discovering patterns,
trends, and insights from large datasets using
machine learning, statistical analysis, and other
computational techniques. It involves extracting
useful information and knowledge from data that
might be hidden or difficult to discover using
traditional data analysis methods.
3.
Data Mining
Data miningis used in a variety of applications such
as marketing, healthcare, finance, and social media
analysis. In marketing, for example, data mining
techniques are used to identify customer behavior
patterns and preferences, which can then be used to
develop targeted marketing campaigns. In
healthcare, data mining is used to identify risk factors
for diseases and to predict patient outcomes.
4.
Here are someexamples of operational
systems in data mining
1. Customer Relationship Management (CRM) systems - CRM systems are used to manage interactions with
customers, track sales leads, and monitor customer satisfaction. By analyzing data from a CRM system,
businesses can identify patterns in customer behavior and preferences, and tailor their marketing and
sales strategies accordingly.
2. Enterprise Resource Planning (ERP) systems - ERP systems integrate and manage a variety of business
functions, such as inventory management, order processing, and accounting. By analyzing data from an
ERP system, businesses can identify trends in supply and demand, optimize inventory levels, and
improve production efficiency.
3. Point-of-Sale (POS) systems - POS systems are used to process transactions in retail and hospitality
environments. By analyzing data from a POS system, businesses can identify trends in sales volume and
customer preferences, and adjust their pricing, promotions, and product offerings accordingly.
4. Web analytics systems - Web analytics systems are used to track and analyze website traffic and user
behavior. By analyzing data from a web analytics system, businesses can identify patterns in user
engagement and conversion rates, and optimize their website design and content to improve user
experience.
5.
Data Mining asa step in the process of knowledge
discovery
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Presentation
6.
Data Mining asa step in the process of
knowledge discovery
KDD stands for Knowledge Discovery in Databases: It is a process
of discovering useful knowledge or information from large amounts
of data stored in databases.
1. Data Cleaning: This step involves removing noise, errors, and
inconsistencies from the data, such as missing values or outliers.
WHY? to ensure that the analyses are based on accurate and
reliable data.
Example: database for duplicate data, such as orders with identical
customer names, order dates, and product names. Duplicate data
can be removed to avoid skewing subsequent analyses and to
ensure that data is not over-represented
7.
Data Mining asa step in the process of
knowledge discovery
3. Data Integration: is an important step in the KDD process because it
involves combining data from multiple sources into a single coherent
data set, which can then be used for further analysis using data mining
techniques. Why? To improved data quality and to avoid redundancy.
Example: In Financial industry. The bank could integrate data from different
sources such as the customer's transaction history, credit scores from multiple
credit bureaus, and information about their employment history and income. By
combining these data sources, the bank can build a more complete view of the
customer's financial situation and behavior, which can help to make more
informed lending decisions.
8.
Data Mining asa step in the process of
knowledge discovery
3. Data selection: This step involves selecting relevant data for
analysis. It requires a clear understanding of the research question or
problem to be solved, as well as the data sources available. Data
selection involves deciding which data sources and variables are most
relevant to the research question and eliminating irrelevant data
For example: How many flowers should a florist order prior
to a major event? Through data mining, the florist can assess past
sales, check what customers are searching for online, gauge their
interests through social media posts, and make projections based on
the success of other recent events during the year.
9.
Data Mining asa step in the process of
knowledge discovery
4. Data Transformation: This step involves converting the
preprocessed data into a format that can be used by the
algorithms. This may include converting categorical
variables into numerical ones or using dimensionality
reduction techniques.
Example: MS WORD TO PDF
10.
Data Mining asa step in the process of
knowledge discovery
5. Data Mining: This step involves applying machine learning and
statistical algorithms to the transformed data to discover patterns and
relationships.
Example in Health Care industry
Suppose a hospital wants to identify patients who are at risk of
readmission after being discharged. The hospital could use data mining
techniques to analyze patient data from EHRs, such as patient
demographics, medical history, medication use, and lab results.
11.
Data Mining asa step in the process of
knowledge discovery
6. Pattern Evaluation : This step involves assessing the patterns
and relationships discovered during data mining to determine
their significance and usefulness.
Example: fraud detection in credit card transactions
Suppose a credit card company has used data mining techniques
such as clustering and classification to identify patterns of
fraudulent transactions based on factors such as transaction
amount, location, and time of day.
12.
Data Mining asa step in the process of
knowledge discovery
7. Knowledge Presentation: This step involves representing the knowledge gained from
the previous steps in a form that can be easily understood and used by humans, such as
visualization or a report.
Example: maintenance for manufacturing equipment
Suppose a manufacturing company has used data mining techniques such as
clustering and predictive modeling to identify patterns in equipment usage and
predict when equipment is likely to fail.
To present this knowledge to stakeholders, the manufacturing company could create
a dashboard or report that summarizes the key findings from the analysis, such as the
equipment that is most at risk of failure and the predicted time until failure. The
report could also include visualizations such as graphs or heat maps to help
stakeholders understand the data more easily.
Example:
Market basketanalysis - This technique involves analyzing transactional data to
identify which products are frequently purchased together.
ex. Amazon.com and Shopee
Customer segmentation - companies use customer segmentation to better
understand their customer base and tailor their marketing strategies to specific
customer groups
ex. Victoria secret
Customer lifetime value- can be used to calculate the lifetime value of a customer,
which is the total amount of revenue a customer is expected to generate over their
lifetime.
ex. Starbucks(discounts)
15.
Database data
Adatabase system, also called a database management system (DBMS),
consists of a collection of interrelated data, known as a database, and a
set of software programs to manage and access the data.
A database system is a software application that manages the storage,
retrieval, and manipulation of data in a database. It provides a structured
and organized way to store and manage data so that it can be accessed
and used efficiently by authorized users.
Some examples of database systems include Oracle, MySQL, Microsoft
SQL Server, and PostgreSQL.
16.
Relational Database
Arelational database is a type of database that
stores and organizes data into tables, which are
related to each other through a common key. In
a relational database, data is stored in the form
of rows and columns, with each row representing
a single record and each column representing a
data element or field.
Data warehouse
are specificallydesigned to store large volumes of
data from various sources and to support complex
queries for analysis and decision-making purposes.
These data warehouses usually store historical data
over a period of time, which is optimized for
querying and analysis.
Here are someexamples of operational
systems in data warehousing
Data mart =filtered by a line of business, org.
and subject areas. Decentralized (4 na data
mart subject areas)
Data warehouse=not filtered. Centralized and
more on relational data bases.
(operational systems,crm system,erp
system,billings and supply chain)
21.
Transactional data
Transactional datarefers to data that is generated
by transactions or interactions between entities,
such as customers and businesses. This data
typically includes details such as transaction date
and time, the identity of the entities involved in the
transaction, the products or services purchased,
and the price paid.
22.
Here are someexamples of operational
systems in data warehousing:
1. Sales and Customer Relationship Management (CRM) systems - These
systems capture customer information, sales orders, invoices, and other
transactional data related to sales and customer interactions.
2. Enterprise Resource Planning (ERP) systems - These systems manage the
organization's resources, including inventory, procurement, financial
transactions, and human resources.
3. Marketing Automation systems - These systems capture and track
customer behavior and engagement data, including website visits, social media
interactions, email opens and clicks, and lead generation activities
23.
Here are someexamples of operational
systems in data warehousing:
4. Supply Chain Management (SCM) systems - These systems manage the
organization's supply chain activities, including procurement, inventory
management, order fulfillment, and logistics.
5. Online Transaction Processing (OLTP) systems - These are general-purpose
systems that support a wide range of operational processes, such as order
entry, account management, and transaction processing.