KEMBAR78
HITS Algorithm in Data and Web MIning.pdf
HITS Algorithm
Hyperlink Induced Topic Search (HITS) is an algorithm used in link analysis. It could discover
and rank the webpages relevant for a particular search. The idea of this algorithm originated from
the fact that an ideal website should link to other relevant sites and also being linked by other
important sites.
HITS uses hubs and authorities to define a recursive relationship between webpages.
• Authority: A node is high-quality if many high-quality nodes link to it
• Hub: A node is high-quality if it links to many high-quality nodes
Algorithm Steps
• Initialize the hub and authority of each node with a value of 1
• For each iteration, update the hub and authority of every node in the graph
• The new authority is the sum of the hub of its parents
• The new hub is the sum of the authority of its children
• Normalize the new authority and hub
Problem:
Use HITS algorithm to find hubs and authorities from the following web pages
Pages A (Outlinks to B, C, D)
Pages B (Outlinks to A, C, D)
Pages C (Outlinks to D)
Pages D (Outlinks to C, E)
Pages E (Outlinks to B, C, D)
Solution:
Connection Matrix:
A B C D E
A 0 1 1 1 0
B 1 0 1 1 0
C 0 0 0 1 0
D 0 0 1 0 1
E 0 1 1 1 0
Inlinks and Outlinks of web pages:
Web Pages Inlinks Outlinks
A B B, C, D
B A, E A, C, D
C A, B, D, E D
D A, B, C, E C, E
E D B, C, D
Calculation of Hubs and Authorities without normalization:
X0 Y0 X1 Y1 X2 Y2
A 1 1 1 3 3 27
B 1 1 2 3 6 24
C 1 1 4 1 11 10
D 1 1 4 2 10 13
E 1 1 1 3 2 27

HITS Algorithm in Data and Web MIning.pdf

  • 1.
    HITS Algorithm Hyperlink InducedTopic Search (HITS) is an algorithm used in link analysis. It could discover and rank the webpages relevant for a particular search. The idea of this algorithm originated from the fact that an ideal website should link to other relevant sites and also being linked by other important sites. HITS uses hubs and authorities to define a recursive relationship between webpages. • Authority: A node is high-quality if many high-quality nodes link to it • Hub: A node is high-quality if it links to many high-quality nodes Algorithm Steps • Initialize the hub and authority of each node with a value of 1 • For each iteration, update the hub and authority of every node in the graph • The new authority is the sum of the hub of its parents • The new hub is the sum of the authority of its children • Normalize the new authority and hub Problem: Use HITS algorithm to find hubs and authorities from the following web pages Pages A (Outlinks to B, C, D) Pages B (Outlinks to A, C, D) Pages C (Outlinks to D) Pages D (Outlinks to C, E) Pages E (Outlinks to B, C, D) Solution: Connection Matrix: A B C D E A 0 1 1 1 0 B 1 0 1 1 0 C 0 0 0 1 0 D 0 0 1 0 1 E 0 1 1 1 0
  • 2.
    Inlinks and Outlinksof web pages: Web Pages Inlinks Outlinks A B B, C, D B A, E A, C, D C A, B, D, E D D A, B, C, E C, E E D B, C, D Calculation of Hubs and Authorities without normalization: X0 Y0 X1 Y1 X2 Y2 A 1 1 1 3 3 27 B 1 1 2 3 6 24 C 1 1 4 1 11 10 D 1 1 4 2 10 13 E 1 1 1 3 2 27