FINAL YEAR PROJECT SYNOPSIS
WEB CRAWLER
SUBMITTED BY :
AKARSH GUPTA
(160681303)
MONU RANA (1606813029)
SAURABH TEOTIA
(1606813041)
SHIVAM TYAGI (1606813047)
Contents
1 D E F I N I T I O N : - W E B C R AW L E R 7 OPERATING
ENVIRONMENT
2 USE CASES OF WEB 8 FUTURE SCOPE OF THE
CRAWLER SYSTEM
3 NEED OF THE WEB 9 REFERENCES
CRAWLER
4 HOW DO SEARCH ENGINEWORKS
5 SCOPE OF WORK
6 FAESIBILITY STUDY
QUESTION ARISES
What is a
Web Crawler ?
Definition:
A web crawler (also known as a web
spider or web robot) is a program or
automated script which browses the
World Wide Web in a methodical,
automated manner. This process is
called Web crawling or spidering.
USE CASES OF WEB CRAWLER
1.SEARCH ENGINES 2.COPY RIGHT VIOLATION
3.KEYBOARD BASED 4.WEB ANALYTICS
FINDINGS
& many more...
NEEDS OF WEB CRAWLER
YOUR LOGO
• To maintain mirror sites for popular websites.
• To test web pages and links for valid syntax and
structure.
• To monitor sites to see when their structure or
content change.
• To search for copyright infringements.
How do search engine works ?
• The job of the crawlers is to discover new content.they do this
by following links.
• Crawling is a massive process and the search engines crawls
billions of pages every day finding new content and recrawling
old content to checkc if it's changed.
• Search engines crawlers aren't smart,they are simple bits of
software programmed to single mindedly collect data and
send it back to search engine data centres
SCOPE OF WORK :
There are basically three steps that are involved in the web
crawling procedure
1. The search bot starts by crawling the pages of your site.
2. Then it continues indexing the words and contents of the
sites.
3. Finally it visits the links that are found in your site.
• when the spider doesn't find a page it will eventually be
deleted from the index
FEASIBILITY STUDY
Feasibility study is defined as evaluation or analysis of potential
impact of a proposed project or program. There are 3 aspects of
the feasibility study :
1. TECHNICAL FEASIBILITY:
2. FINANCIAL FEASIBILITY:
3. OPERATIONAL FEASIBILITY:
OPERATING ENVIRONMENT
Software requirement at the time of Developement:
FrontEnd - AWT, Swing
BackEnd - java
Technology - JSP-Servelet.java
Software - JDK(1.5 or above)
Hardware reqirement :
Hard Disk - at least 20GB HDD
ram - 1 GB RAM
PLATFORM - JAVA
FUTURE SCOPE OF THE SYSTEM
This application can be easily imlemented under various
suations:
We can add new features when required. Reusability is
possible as and when require in this application. There is
flexibility in all modules.
After making modifications to it , it can become a more
powerful search engine.
References
GOOGLE
Thanks
for
listening