Data mining slide for data mining process

Data Mining
• Data mining refers to extracting or mining knowledge from large amounts
of data.
• Data mining should have been more appropriately named as knowledge
mining which emphasis on mining from large amounts of data.
• It is the computational process of discovering patterns in large data sets
involving methods at the intersection of artificial intelligence, machine
learning, statistics, and database systems.
• The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for further
use.
• The key properties of data mining are
a) Automatic discovery of patterns
b) Prediction of likely outcomes
c) Creation of actionable information
d) Focus on large datasets and databases

Data Mining Functionalities
• Data mining functionalities are used to specify the kind of patterns to be
found in data mining tasks.
• In general, data mining tasks can be classified into two categories:
descriptive and predictive.
a) Descriptive mining tasks characterize the general properties of the data
in the database.
b) Predictive mining tasks perform inference on the current data in order to
make predictions.
• Data mining system can able to mine multiple kinds of patterns to
accommodate different user expectations or applications.
• Data mining systems should be able to discover patterns at various
granularity (i.e., different levels of abstraction).
• Data mining systems should also allow users to specify hints to guide or
focus the search for interesting patterns.

Performance Issues
• Efficiency and scalability of data mining algorithms − In order to
effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
• Parallel, distributed, and incremental mining algorithms − The factors
such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of
parallel and distributed data mining algorithms. These algorithms
divide the data into partitions which is further processed in a parallel
fashion. Then the results from the partitions is merged. The
incremental algorithms, update databases without mining the data
again from scratch.

Diverse Data Types Issues
• Handling of relational and complex types of data − The database may
contain complex data objects, multimedia data objects, spatial data,
temporal data etc. It is not possible for one system to mine all these kind
of data.
• Mining information from heterogeneous databases and global
information systems − The data is available at different data sources on
LAN or WAN. These data source may be structured, semi structured or
unstructured. Therefore mining the knowledge from them adds challenges
to data mining.

Data mining slide for data mining process

More Related Content

Similar to Data mining slide for data mining process

More from NivaTripathy1

Recently uploaded

Data mining slide for data mining process