KEMBAR78
Mining Concepts Apriori Frequent Pattern | PDF | Applied Mathematics | Theoretical Computer Science
0% found this document useful (0 votes)
5 views6 pages

Mining Concepts Apriori Frequent Pattern

The document discusses various data mining techniques, focusing on Apriori, Frequent Pattern Mining, and Pattern Growth concepts. It outlines the steps and applications of the Apriori algorithm, frequent pattern mining methods, and the Pattern Growth approach, including the FP-Growth algorithm. Additionally, it covers frequent subgraph mining, the gSpan algorithm, and link mining, emphasizing their applications in fields like bioinformatics and social network analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Mining Concepts Apriori Frequent Pattern

The document discusses various data mining techniques, focusing on Apriori, Frequent Pattern Mining, and Pattern Growth concepts. It outlines the steps and applications of the Apriori algorithm, frequent pattern mining methods, and the Pattern Growth approach, including the FP-Growth algorithm. Additionally, it covers frequent subgraph mining, the gSpan algorithm, and link mining, emphasizing their applications in fields like bioinformatics and social network analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Apriori, Frequent Pattern Mining and Pattern Growth Concepts

**Apriori-Based Approach in Graph Mining**

The Apriori algorithm is a classic algorithm for mining frequent itemsets and association rules in

transactional datasets.

Its basic idea is to use prior knowledge about the problem domain to limit the search space. It is

primarily used in frequent

itemset mining and association rule learning.

Key Steps in Apriori:

1. Generate Candidate Itemsets: Starting from individual items, generate larger itemsets by

combining frequent itemsets.

2. Prune Unnecessary Itemsets: If an itemset has any infrequent subset, it is pruned from further

consideration.

3. Measure Frequency: Calculate the frequency (support) of each candidate itemset.

4. Repeat: Repeat the process for larger itemsets until no more frequent itemsets can be found.

Applications:

- Market Basket Analysis: Identifying products frequently bought together.

- Web Mining: Identifying frequent patterns in web browsing data.

**Frequent Pattern Mining**

Frequent pattern mining is the process of discovering recurring patterns, associations, or


correlations within a dataset.

It is most commonly applied to datasets where items or events occur repeatedly, such as in market

basket analysis or in

biological data analysis.

Key Steps in Frequent Pattern Mining:

1. Identify frequent patterns by finding sets of items or events that occur frequently in the dataset.

2. Generate candidate patterns by combining smaller frequent patterns into larger ones.

3. Calculate the frequency of the patterns to identify which patterns occur with the highest

frequency.

Techniques for Frequent Pattern Mining:

- **Apriori Algorithm**: Uses a breadth-first search approach to identify frequent patterns.

- **FP-Growth Algorithm**: A more efficient algorithm for frequent pattern mining that compresses

the dataset into a compact

tree structure (FP-tree) to avoid candidate generation.

Applications:

- Market Basket Analysis: Discovering which items are often purchased together.

- Biological Sequence Analysis: Finding common subsequences in DNA, RNA, or protein

sequences.

**Pattern Growth Approach**

The Pattern Growth approach is a method used to mine frequent patterns in large datasets. Unlike

the Apriori algorithm,


which generates candidate itemsets, Pattern Growth algorithms directly mine the frequent patterns

by growing them step by step

without the need to generate and test candidate patterns.

Key Concepts:

1. **Frequent Pattern Growth**: The basic idea is to start from frequent single items and grow them

into larger patterns

by adding items that have a high probability of occurring together.

2. **Prefix-Projected Tree (FP-Tree)**: The data is represented as a compact structure known as an

FP-Tree, which helps

efficiently mine frequent patterns by avoiding the generation of candidate patterns.

Algorithms:

- **FP-Growth Algorithm**: This algorithm builds a compact FP-tree structure to store the data and

then uses it to

mine frequent patterns. It is highly efficient because it avoids generating a candidate pattern set

and instead mines

frequent patterns directly by recursively dividing the dataset.

Applications:

- Market Basket Analysis: Efficiently finding frequent itemsets without candidate generation.

- Data Compression: Finding patterns in datasets to help compress data by representing it with

frequent patterns.

Frequent Subgraph Mining:

- Frequent subgraph mining involves the extraction of subgraphs that occur frequently in a graph
dataset.

- This is especially important in the analysis of molecular structures, network data, or social network

analysis where

subgraphs represent meaningful structures, such as motifs or patterns in the graph.

Applications:

- Bioinformatics: Identifying subgraphs that represent recurring molecular structures or

protein-protein interactions.

- Social Network Analysis: Detecting communities or motifs in social networks.

**GSAP Algorithm for Frequent Subgraph Mining**

The **gSpan algorithm** is one of the most efficient algorithms for frequent subgraph mining. The

algorithm is based on

depth-first search (DFS) and tries to mine frequent subgraphs in a graph database without

generating candidate subgraphs.

Key Features of gSpan:

1. **DFS-based Search**: The algorithm performs a DFS traversal to find frequent subgraphs.

2. **Canonical Forms**: gSpan uses a canonical labeling technique to uniquely represent each

graph, making it easier to identify

duplicates and avoid redundant searches.

3. **Efficient**: By leveraging DFS and canonical labeling, gSpan avoids costly computations and

reduces the search space for

frequent subgraph mining.


Applications:

- Bioinformatics: Mining molecular structures and interactions.

- Social Network Analysis: Detecting subgraphs or motifs representing certain behaviors or

communities.

Link Mining:

Link Mining is a type of data mining that focuses on discovering relationships or associations

between entities in a graph

or network. In link mining, the "links" or "edges" in the graph represent the relationships or

interactions between entities.

This field of mining can be applied to a wide variety of networks, such as social networks,

communication networks, citation

networks, biological networks, and the World Wide Web.

Key Concepts in Link Mining:

- **Graph Representation**: Entities are represented as nodes (vertices), and their relationships or

interactions are represented as

edges (links). For example, in a social network, people are nodes, and friendships or interactions

are edges.

- **Link Prediction**: Link prediction is a task in link mining where the goal is to predict missing links

or future links between

entities in a network.

- **Link Analysis**: Link analysis involves studying the structure of the links to understand the

relationships between entities.

- **Graph Data**: Link mining is done on graph data, where entities are connected by links or edges,

and this data can be directed


or undirected.

You might also like