KEMBAR78
DAtawarehousing and datamining in IT ind | PPTX
OVERVIEW OF DATA
MINING
MA. ARA LOVELA G. PAHUTAN
Data Mining
Data mining is the process of discovering patterns,
trends, and insights from large datasets using
machine learning, statistical analysis, and other
computational techniques. It involves extracting
useful information and knowledge from data that
might be hidden or difficult to discover using
traditional data analysis methods.
Data Mining
Data mining is used in a variety of applications such
as marketing, healthcare, finance, and social media
analysis. In marketing, for example, data mining
techniques are used to identify customer behavior
patterns and preferences, which can then be used to
develop targeted marketing campaigns. In
healthcare, data mining is used to identify risk factors
for diseases and to predict patient outcomes.
Here are some examples of operational
systems in data mining
1. Customer Relationship Management (CRM) systems - CRM systems are used to manage interactions with
customers, track sales leads, and monitor customer satisfaction. By analyzing data from a CRM system,
businesses can identify patterns in customer behavior and preferences, and tailor their marketing and
sales strategies accordingly.
2. Enterprise Resource Planning (ERP) systems - ERP systems integrate and manage a variety of business
functions, such as inventory management, order processing, and accounting. By analyzing data from an
ERP system, businesses can identify trends in supply and demand, optimize inventory levels, and
improve production efficiency.
3. Point-of-Sale (POS) systems - POS systems are used to process transactions in retail and hospitality
environments. By analyzing data from a POS system, businesses can identify trends in sales volume and
customer preferences, and adjust their pricing, promotions, and product offerings accordingly.
4. Web analytics systems - Web analytics systems are used to track and analyze website traffic and user
behavior. By analyzing data from a web analytics system, businesses can identify patterns in user
engagement and conversion rates, and optimize their website design and content to improve user
experience.
Data Mining as a step in the process of knowledge
discovery
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Presentation
Data Mining as a step in the process of
knowledge discovery
KDD stands for Knowledge Discovery in Databases: It is a process
of discovering useful knowledge or information from large amounts
of data stored in databases.
1. Data Cleaning: This step involves removing noise, errors, and
inconsistencies from the data, such as missing values or outliers.
WHY? to ensure that the analyses are based on accurate and
reliable data.
Example: database for duplicate data, such as orders with identical
customer names, order dates, and product names. Duplicate data
can be removed to avoid skewing subsequent analyses and to
ensure that data is not over-represented
Data Mining as a step in the process of
knowledge discovery
3. Data Integration: is an important step in the KDD process because it
involves combining data from multiple sources into a single coherent
data set, which can then be used for further analysis using data mining
techniques. Why? To improved data quality and to avoid redundancy.
Example: In Financial industry. The bank could integrate data from different
sources such as the customer's transaction history, credit scores from multiple
credit bureaus, and information about their employment history and income. By
combining these data sources, the bank can build a more complete view of the
customer's financial situation and behavior, which can help to make more
informed lending decisions.
Data Mining as a step in the process of
knowledge discovery
3. Data selection: This step involves selecting relevant data for
analysis. It requires a clear understanding of the research question or
problem to be solved, as well as the data sources available. Data
selection involves deciding which data sources and variables are most
relevant to the research question and eliminating irrelevant data
For example: How many flowers should a florist order prior
to a major event? Through data mining, the florist can assess past
sales, check what customers are searching for online, gauge their
interests through social media posts, and make projections based on
the success of other recent events during the year.
Data Mining as a step in the process of
knowledge discovery
4. Data Transformation: This step involves converting the
preprocessed data into a format that can be used by the
algorithms. This may include converting categorical
variables into numerical ones or using dimensionality
reduction techniques.
Example: MS WORD TO PDF
Data Mining as a step in the process of
knowledge discovery
5. Data Mining: This step involves applying machine learning and
statistical algorithms to the transformed data to discover patterns and
relationships.
Example in Health Care industry
Suppose a hospital wants to identify patients who are at risk of
readmission after being discharged. The hospital could use data mining
techniques to analyze patient data from EHRs, such as patient
demographics, medical history, medication use, and lab results.
Data Mining as a step in the process of
knowledge discovery
6. Pattern Evaluation : This step involves assessing the patterns
and relationships discovered during data mining to determine
their significance and usefulness.
Example: fraud detection in credit card transactions
Suppose a credit card company has used data mining techniques
such as clustering and classification to identify patterns of
fraudulent transactions based on factors such as transaction
amount, location, and time of day.
Data Mining as a step in the process of
knowledge discovery
7. Knowledge Presentation: This step involves representing the knowledge gained from
the previous steps in a form that can be easily understood and used by humans, such as
visualization or a report.
Example: maintenance for manufacturing equipment
 Suppose a manufacturing company has used data mining techniques such as
clustering and predictive modeling to identify patterns in equipment usage and
predict when equipment is likely to fail.
 To present this knowledge to stakeholders, the manufacturing company could create
a dashboard or report that summarizes the key findings from the analysis, such as the
equipment that is most at risk of failure and the predicted time until failure. The
report could also include visualizations such as graphs or heat maps to help
stakeholders understand the data more easily.
Data Mining Application
Database data
Data warehouse
 Transactional data
Example:
 Market basket analysis - This technique involves analyzing transactional data to
identify which products are frequently purchased together.
ex. Amazon.com and Shopee
 Customer segmentation - companies use customer segmentation to better
understand their customer base and tailor their marketing strategies to specific
customer groups
ex. Victoria secret
 Customer lifetime value- can be used to calculate the lifetime value of a customer,
which is the total amount of revenue a customer is expected to generate over their
lifetime.
ex. Starbucks(discounts)
Database data
 A database system, also called a database management system (DBMS),
consists of a collection of interrelated data, known as a database, and a
set of software programs to manage and access the data.
 A database system is a software application that manages the storage,
retrieval, and manipulation of data in a database. It provides a structured
and organized way to store and manage data so that it can be accessed
and used efficiently by authorized users.
 Some examples of database systems include Oracle, MySQL, Microsoft
SQL Server, and PostgreSQL.
Relational Database
 A relational database is a type of database that
stores and organizes data into tables, which are
related to each other through a common key. In
a relational database, data is stored in the form
of rows and columns, with each row representing
a single record and each column representing a
data element or field.
Example: One to many
Primary key
Parent table
Child table
Data warehouse
are specifically designed to store large volumes of
data from various sources and to support complex
queries for analysis and decision-making purposes.
These data warehouses usually store historical data
over a period of time, which is optimized for
querying and analysis.
Example:
Here are some examples of operational
systems in data warehousing
 Data mart =filtered by a line of business, org.
and subject areas. Decentralized (4 na data
mart subject areas)
 Data warehouse=not filtered. Centralized and
more on relational data bases.
(operational systems,crm system,erp
system,billings and supply chain)
Transactional data
Transactional data refers to data that is generated
by transactions or interactions between entities,
such as customers and businesses. This data
typically includes details such as transaction date
and time, the identity of the entities involved in the
transaction, the products or services purchased,
and the price paid.
Here are some examples of operational
systems in data warehousing:
1. Sales and Customer Relationship Management (CRM) systems - These
systems capture customer information, sales orders, invoices, and other
transactional data related to sales and customer interactions.
2. Enterprise Resource Planning (ERP) systems - These systems manage the
organization's resources, including inventory, procurement, financial
transactions, and human resources.
3. Marketing Automation systems - These systems capture and track
customer behavior and engagement data, including website visits, social media
interactions, email opens and clicks, and lead generation activities
Here are some examples of operational
systems in data warehousing:
4. Supply Chain Management (SCM) systems - These systems manage the
organization's supply chain activities, including procurement, inventory
management, order fulfillment, and logistics.
5. Online Transaction Processing (OLTP) systems - These are general-purpose
systems that support a wide range of operational processes, such as order
entry, account management, and transaction processing.
DAtawarehousing and datamining in IT ind

DAtawarehousing and datamining in IT ind

  • 1.
    OVERVIEW OF DATA MINING MA.ARA LOVELA G. PAHUTAN
  • 2.
    Data Mining Data miningis the process of discovering patterns, trends, and insights from large datasets using machine learning, statistical analysis, and other computational techniques. It involves extracting useful information and knowledge from data that might be hidden or difficult to discover using traditional data analysis methods.
  • 3.
    Data Mining Data miningis used in a variety of applications such as marketing, healthcare, finance, and social media analysis. In marketing, for example, data mining techniques are used to identify customer behavior patterns and preferences, which can then be used to develop targeted marketing campaigns. In healthcare, data mining is used to identify risk factors for diseases and to predict patient outcomes.
  • 4.
    Here are someexamples of operational systems in data mining 1. Customer Relationship Management (CRM) systems - CRM systems are used to manage interactions with customers, track sales leads, and monitor customer satisfaction. By analyzing data from a CRM system, businesses can identify patterns in customer behavior and preferences, and tailor their marketing and sales strategies accordingly. 2. Enterprise Resource Planning (ERP) systems - ERP systems integrate and manage a variety of business functions, such as inventory management, order processing, and accounting. By analyzing data from an ERP system, businesses can identify trends in supply and demand, optimize inventory levels, and improve production efficiency. 3. Point-of-Sale (POS) systems - POS systems are used to process transactions in retail and hospitality environments. By analyzing data from a POS system, businesses can identify trends in sales volume and customer preferences, and adjust their pricing, promotions, and product offerings accordingly. 4. Web analytics systems - Web analytics systems are used to track and analyze website traffic and user behavior. By analyzing data from a web analytics system, businesses can identify patterns in user engagement and conversion rates, and optimize their website design and content to improve user experience.
  • 5.
    Data Mining asa step in the process of knowledge discovery 1. Data Cleaning 2. Data Integration 3. Data Selection 4. Data Transformation 5. Data Mining 6. Pattern Evaluation 7. Knowledge Presentation
  • 6.
    Data Mining asa step in the process of knowledge discovery KDD stands for Knowledge Discovery in Databases: It is a process of discovering useful knowledge or information from large amounts of data stored in databases. 1. Data Cleaning: This step involves removing noise, errors, and inconsistencies from the data, such as missing values or outliers. WHY? to ensure that the analyses are based on accurate and reliable data. Example: database for duplicate data, such as orders with identical customer names, order dates, and product names. Duplicate data can be removed to avoid skewing subsequent analyses and to ensure that data is not over-represented
  • 7.
    Data Mining asa step in the process of knowledge discovery 3. Data Integration: is an important step in the KDD process because it involves combining data from multiple sources into a single coherent data set, which can then be used for further analysis using data mining techniques. Why? To improved data quality and to avoid redundancy. Example: In Financial industry. The bank could integrate data from different sources such as the customer's transaction history, credit scores from multiple credit bureaus, and information about their employment history and income. By combining these data sources, the bank can build a more complete view of the customer's financial situation and behavior, which can help to make more informed lending decisions.
  • 8.
    Data Mining asa step in the process of knowledge discovery 3. Data selection: This step involves selecting relevant data for analysis. It requires a clear understanding of the research question or problem to be solved, as well as the data sources available. Data selection involves deciding which data sources and variables are most relevant to the research question and eliminating irrelevant data For example: How many flowers should a florist order prior to a major event? Through data mining, the florist can assess past sales, check what customers are searching for online, gauge their interests through social media posts, and make projections based on the success of other recent events during the year.
  • 9.
    Data Mining asa step in the process of knowledge discovery 4. Data Transformation: This step involves converting the preprocessed data into a format that can be used by the algorithms. This may include converting categorical variables into numerical ones or using dimensionality reduction techniques. Example: MS WORD TO PDF
  • 10.
    Data Mining asa step in the process of knowledge discovery 5. Data Mining: This step involves applying machine learning and statistical algorithms to the transformed data to discover patterns and relationships. Example in Health Care industry Suppose a hospital wants to identify patients who are at risk of readmission after being discharged. The hospital could use data mining techniques to analyze patient data from EHRs, such as patient demographics, medical history, medication use, and lab results.
  • 11.
    Data Mining asa step in the process of knowledge discovery 6. Pattern Evaluation : This step involves assessing the patterns and relationships discovered during data mining to determine their significance and usefulness. Example: fraud detection in credit card transactions Suppose a credit card company has used data mining techniques such as clustering and classification to identify patterns of fraudulent transactions based on factors such as transaction amount, location, and time of day.
  • 12.
    Data Mining asa step in the process of knowledge discovery 7. Knowledge Presentation: This step involves representing the knowledge gained from the previous steps in a form that can be easily understood and used by humans, such as visualization or a report. Example: maintenance for manufacturing equipment  Suppose a manufacturing company has used data mining techniques such as clustering and predictive modeling to identify patterns in equipment usage and predict when equipment is likely to fail.  To present this knowledge to stakeholders, the manufacturing company could create a dashboard or report that summarizes the key findings from the analysis, such as the equipment that is most at risk of failure and the predicted time until failure. The report could also include visualizations such as graphs or heat maps to help stakeholders understand the data more easily.
  • 13.
    Data Mining Application Databasedata Data warehouse  Transactional data
  • 14.
    Example:  Market basketanalysis - This technique involves analyzing transactional data to identify which products are frequently purchased together. ex. Amazon.com and Shopee  Customer segmentation - companies use customer segmentation to better understand their customer base and tailor their marketing strategies to specific customer groups ex. Victoria secret  Customer lifetime value- can be used to calculate the lifetime value of a customer, which is the total amount of revenue a customer is expected to generate over their lifetime. ex. Starbucks(discounts)
  • 15.
    Database data  Adatabase system, also called a database management system (DBMS), consists of a collection of interrelated data, known as a database, and a set of software programs to manage and access the data.  A database system is a software application that manages the storage, retrieval, and manipulation of data in a database. It provides a structured and organized way to store and manage data so that it can be accessed and used efficiently by authorized users.  Some examples of database systems include Oracle, MySQL, Microsoft SQL Server, and PostgreSQL.
  • 16.
    Relational Database  Arelational database is a type of database that stores and organizes data into tables, which are related to each other through a common key. In a relational database, data is stored in the form of rows and columns, with each row representing a single record and each column representing a data element or field.
  • 17.
    Example: One tomany Primary key Parent table Child table
  • 18.
    Data warehouse are specificallydesigned to store large volumes of data from various sources and to support complex queries for analysis and decision-making purposes. These data warehouses usually store historical data over a period of time, which is optimized for querying and analysis.
  • 19.
  • 20.
    Here are someexamples of operational systems in data warehousing  Data mart =filtered by a line of business, org. and subject areas. Decentralized (4 na data mart subject areas)  Data warehouse=not filtered. Centralized and more on relational data bases. (operational systems,crm system,erp system,billings and supply chain)
  • 21.
    Transactional data Transactional datarefers to data that is generated by transactions or interactions between entities, such as customers and businesses. This data typically includes details such as transaction date and time, the identity of the entities involved in the transaction, the products or services purchased, and the price paid.
  • 22.
    Here are someexamples of operational systems in data warehousing: 1. Sales and Customer Relationship Management (CRM) systems - These systems capture customer information, sales orders, invoices, and other transactional data related to sales and customer interactions. 2. Enterprise Resource Planning (ERP) systems - These systems manage the organization's resources, including inventory, procurement, financial transactions, and human resources. 3. Marketing Automation systems - These systems capture and track customer behavior and engagement data, including website visits, social media interactions, email opens and clicks, and lead generation activities
  • 23.
    Here are someexamples of operational systems in data warehousing: 4. Supply Chain Management (SCM) systems - These systems manage the organization's supply chain activities, including procurement, inventory management, order fulfillment, and logistics. 5. Online Transaction Processing (OLTP) systems - These are general-purpose systems that support a wide range of operational processes, such as order entry, account management, and transaction processing.