KEMBAR78
Data mining slide for data mining process | PPTX
Data Mining
• Data mining refers to extracting or mining knowledge from large amounts
of data.
• Data mining should have been more appropriately named as knowledge
mining which emphasis on mining from large amounts of data.
• It is the computational process of discovering patterns in large data sets
involving methods at the intersection of artificial intelligence, machine
learning, statistics, and database systems.
• The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for further
use.
• The key properties of data mining are
a) Automatic discovery of patterns
b) Prediction of likely outcomes
c) Creation of actionable information
d) Focus on large datasets and databases
Data Mining Functionalities
• Data mining functionalities are used to specify the kind of patterns to be
found in data mining tasks.
• In general, data mining tasks can be classified into two categories:
descriptive and predictive.
a) Descriptive mining tasks characterize the general properties of the data
in the database.
b) Predictive mining tasks perform inference on the current data in order to
make predictions.
• Data mining system can able to mine multiple kinds of patterns to
accommodate different user expectations or applications.
• Data mining systems should be able to discover patterns at various
granularity (i.e., different levels of abstraction).
• Data mining systems should also allow users to specify hints to guide or
focus the search for interesting patterns.
Major Issues In Data Mining
Performance Issues
• Efficiency and scalability of data mining algorithms − In order to
effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
• Parallel, distributed, and incremental mining algorithms − The factors
such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of
parallel and distributed data mining algorithms. These algorithms
divide the data into partitions which is further processed in a parallel
fashion. Then the results from the partitions is merged. The
incremental algorithms, update databases without mining the data
again from scratch.
Diverse Data Types Issues
• Handling of relational and complex types of data − The database may
contain complex data objects, multimedia data objects, spatial data,
temporal data etc. It is not possible for one system to mine all these kind
of data.
• Mining information from heterogeneous databases and global
information systems − The data is available at different data sources on
LAN or WAN. These data source may be structured, semi structured or
unstructured. Therefore mining the knowledge from them adds challenges
to data mining.

Data mining slide for data mining process

  • 1.
    Data Mining • Datamining refers to extracting or mining knowledge from large amounts of data. • Data mining should have been more appropriately named as knowledge mining which emphasis on mining from large amounts of data. • It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. • The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. • The key properties of data mining are a) Automatic discovery of patterns b) Prediction of likely outcomes c) Creation of actionable information d) Focus on large datasets and databases
  • 2.
    Data Mining Functionalities •Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks. • In general, data mining tasks can be classified into two categories: descriptive and predictive. a) Descriptive mining tasks characterize the general properties of the data in the database. b) Predictive mining tasks perform inference on the current data in order to make predictions. • Data mining system can able to mine multiple kinds of patterns to accommodate different user expectations or applications. • Data mining systems should be able to discover patterns at various granularity (i.e., different levels of abstraction). • Data mining systems should also allow users to specify hints to guide or focus the search for interesting patterns.
  • 3.
    Major Issues InData Mining
  • 4.
    Performance Issues • Efficiencyand scalability of data mining algorithms − In order to effectively extract the information from huge amount of data in databases, data mining algorithm must be efficient and scalable. • Parallel, distributed, and incremental mining algorithms − The factors such as huge size of databases, wide distribution of data, and complexity of data mining methods motivate the development of parallel and distributed data mining algorithms. These algorithms divide the data into partitions which is further processed in a parallel fashion. Then the results from the partitions is merged. The incremental algorithms, update databases without mining the data again from scratch.
  • 5.
    Diverse Data TypesIssues • Handling of relational and complex types of data − The database may contain complex data objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one system to mine all these kind of data. • Mining information from heterogeneous databases and global information systems − The data is available at different data sources on LAN or WAN. These data source may be structured, semi structured or unstructured. Therefore mining the knowledge from them adds challenges to data mining.