KEMBAR78
Gaurav web mining | PPTX
WEB MINING
Presented by:
Gaurav Uniyal
161340101008
C.S.E.(Final Year)
Introduction
• Web mining is to apply data mining techniques to
extract and uncover knowledge from web documents and
services.
• Using data mining techniques to make the web more
useful and more profitable and to increase the efficiency
of our interaction with the web.
Web Mining Services
This technology has enabled e-commerce to do personalized
marketing, which eventually results in higher trade volumes.
Which eventually results in higher trade volumes.
WWW Describes…
 Web: A huge, widely-distributed, highly heterogeneous,
semi-structured, hypertext/hypermedia, interconnected
information repository.
 Web is a huge collection of documents plus
– Hyper-link information
– Access and usage information.
Tasks to Conduct
• Resource Finding.
• Information selection & Pre-processing.
• Generalization.
• Analysis.
Web Mining Classification
Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Search Result
Mining
Customized
Usage Tracking
Web Page
Content Mining
General Access
Pattern Tracking
Web Content Mining
 Discovery of useful information from web contents /data
/documents.
 Information Retrieval view.
 Database View.
Web Structure Mining
 Researchers proposed methods of using citations among
journal articles to evaluate the quality of research
papers
 Customer behavior – evaluate a quality of a product
based on the opinions of other customers (instead of
product’s description or advertisement)
Web Usage Mining
 It’s also known as Web log Mining.
 DEFINITION: Discovery of meaningful patterns from
data generated by client-server transactions (or) from
Web server logs.
 Typical Sources of Data:
 automatically generated data stored in server access
logs, referrer logs, agent logs, and client-side cookies.
 user profiles.
 Metadata: page attributes, content attributes, usage
data
 Generate simple statistical reports:
•A summary report of hits and bytes transferred.
• A list of top requested URLs.
• A list of top referrers.
• A list of most common browsers used.
• Hits per hour/day/week/month reports.
• Hits per domain report.
 Learn:
• Who is visiting you site.
• The path visitors take through your pages.
• How much time visitors spend on each page.
• The most common starting page.
• What content are your visitors going through.
• Where visitors are leaving your site.
Design of Web Log Miner
 Weblog is Filtered to generate a relational Database.
 A Data cube is generated from Database.
 OLAP is used to drill-down and roll-up in the cube
Structures
 Hubs.
 Authority.
 Mutual Reinforcing
Relationship.
 Hyperlinks can infer
 The notation of Authority.
Structures
HITS
 HITS Stands for Hyperlink-Induced Topic Search.
 It Explore interactions between hubs and authoritative
pages.
 Expand the root set into a base set.
 Apply Weight-Propagation.
 System Based on the HITS Algorithm. e.g. GOOGLE.
 Difficulties from ignoring textual contexts
• Drifting: When Hubs contains Multiple Topics.
• Topic hijacking: When Many Pages from a single web
site point to the same single Popular site.
Application of Web Mining
 Improve web server system performance.
 Improve site Design.
 Intrusion Detection.
 Predict user’s Action.
 Enhance the quality and delivery of the internet
information services to the end user.
 Facilitates Adaptive sites/personalization.
Thank You!

Gaurav web mining

  • 1.
    WEB MINING Presented by: GauravUniyal 161340101008 C.S.E.(Final Year)
  • 2.
    Introduction • Web miningis to apply data mining techniques to extract and uncover knowledge from web documents and services. • Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web.
  • 3.
    Web Mining Services Thistechnology has enabled e-commerce to do personalized marketing, which eventually results in higher trade volumes. Which eventually results in higher trade volumes.
  • 4.
    WWW Describes…  Web:A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository.  Web is a huge collection of documents plus – Hyper-link information – Access and usage information.
  • 5.
    Tasks to Conduct •Resource Finding. • Information selection & Pre-processing. • Generalization. • Analysis.
  • 6.
    Web Mining Classification WebMining Web Content Mining Web Structure Mining Web Usage Mining Search Result Mining Customized Usage Tracking Web Page Content Mining General Access Pattern Tracking
  • 7.
    Web Content Mining Discovery of useful information from web contents /data /documents.  Information Retrieval view.  Database View.
  • 8.
    Web Structure Mining Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers  Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement)
  • 9.
    Web Usage Mining It’s also known as Web log Mining.  DEFINITION: Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs.  Typical Sources of Data:  automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies.  user profiles.  Metadata: page attributes, content attributes, usage data
  • 10.
     Generate simplestatistical reports: •A summary report of hits and bytes transferred. • A list of top requested URLs. • A list of top referrers. • A list of most common browsers used. • Hits per hour/day/week/month reports. • Hits per domain report.
  • 11.
     Learn: • Whois visiting you site. • The path visitors take through your pages. • How much time visitors spend on each page. • The most common starting page. • What content are your visitors going through. • Where visitors are leaving your site.
  • 12.
    Design of WebLog Miner  Weblog is Filtered to generate a relational Database.  A Data cube is generated from Database.  OLAP is used to drill-down and roll-up in the cube
  • 13.
    Structures  Hubs.  Authority. Mutual Reinforcing Relationship.  Hyperlinks can infer  The notation of Authority.
  • 14.
  • 15.
    HITS  HITS Standsfor Hyperlink-Induced Topic Search.  It Explore interactions between hubs and authoritative pages.  Expand the root set into a base set.  Apply Weight-Propagation.  System Based on the HITS Algorithm. e.g. GOOGLE.  Difficulties from ignoring textual contexts • Drifting: When Hubs contains Multiple Topics. • Topic hijacking: When Many Pages from a single web site point to the same single Popular site.
  • 16.
    Application of WebMining  Improve web server system performance.  Improve site Design.  Intrusion Detection.  Predict user’s Action.  Enhance the quality and delivery of the internet information services to the end user.  Facilitates Adaptive sites/personalization.
  • 17.