KEMBAR78
Keyword-Based Search Engine For Text Documents | PDF | Search Engine Indexing | Python (Programming Language)
0% found this document useful (0 votes)
107 views21 pages

Keyword-Based Search Engine For Text Documents

This document outlines a project to develop a keyword-based search engine for text documents, featuring functionalities like keyword search, spell checking, and user access control, using Python's Flask framework and SQLite. It provides a detailed roadmap for learning essential skills such as Python, HTML, CSS, JavaScript, and SQL, along with a structured development plan for the project. The project emphasizes hands-on learning and aims to enhance practical experience in information retrieval and web development.

Uploaded by

saeedwilco500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views21 pages

Keyword-Based Search Engine For Text Documents

This document outlines a project to develop a keyword-based search engine for text documents, featuring functionalities like keyword search, spell checking, and user access control, using Python's Flask framework and SQLite. It provides a detailed roadmap for learning essential skills such as Python, HTML, CSS, JavaScript, and SQL, along with a structured development plan for the project. The project emphasizes hands-on learning and aims to enhance practical experience in information retrieval and web development.

Uploaded by

saeedwilco500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Keyword-based

Search Engine for


Text Documents
CS619 Project
Introduction:
 This project involves developing a web-based search engine specifically for
plain text documents. The application allows users to upload, index, and
search documents by keywords or phrases, displaying the most relevant
documents first. It includes features like:
• Keyword and Phrase Search: Users can perform searches, with results
ranked by relevance.
• Spell Checker: Automatically detects misspelled words in search queries,
providing suggestions and corrections.
• User Access Control: Account registration is required for document
uploads, while search and viewing are open to all users.
• Custom Dictionaries: Users can add frequently used terms to a personal
dictionary for improved search accuracy.
 The project aims to offer a streamlined, user-friendly document search
experience, leveraging Python’s Flask framework for the backend and SQLite
for data storage. It emphasizes efficient data handling and retrieval, making
it a practical and hands-on learning opportunity in information retrieval.
Why Learn Python?
 Top Choice for AI, Data Science & Automation
• Python is widely used in high-demand fields like artificial intelligence,
machine learning, data science, and automation.
 Simple, Yet Powerful
• Python’s clean, readable syntax makes it perfect for beginners while
offering robust libraries and frameworks for advanced projects.
 High Demand in Industry
• Python skills are highly valued by employers across tech roles, from
web development to data analytics.
 Hands-On Learning with a Real Project
• By building a search engine in Python, you’ll gain practical experience
that’s both valuable and relevant to modern tech careers.
 This project is your chance to master Python and develop skills that
open doors to exciting fields in technology
Roadmap to learn (part 1):
 Step 1: Learn Python
• Focus on the basics: variables, data types, control structures,
functions, and basic libraries. This will build a strong programming
foundation.
• Estimated Time: 6 weeks @ 1 hour daily
• Option 1 (Shradha Kapra – YouTube - Urdu):
https://www.youtube.com/playlist?list=PLGjplNEQ1it8-0CmoljS5yeV-Gl
KSUEt0
• Option 2 (CodeWithHarry – YouTube - Urdu):
https://www.youtube.com/playlist?list=PLu0W_9lII9agwh1XjRt242xIpH
hPT2llg
• Option 3 (Coursera – English):
https://www.coursera.org/learn/python-crash-course
 Recommended Book:
Think Python, 2nd edition by Allen B. Downey
https://greenteapress.com/thinkpython2/thinkpython2.pdf
Roadmap to learn (part 2):
 Step 2: Learn HTML
• Focus solely on HTML to understand structuring content on a
webpage, covering elements like headers, paragraphs, links, forms,
and lists.
• Estimated Time: 4 weeks @ 1 hour daily
• Option 1 (Complete Coding with Prashant Sir – YouTube –
Urdu):
https://www.youtube.com/watch?v=rklidcZ-aLU
• Option 2 (CodeWithHarry – YouTube – Urdu):
https://www.youtube.com/watch?v=BsDoLVMnmZs
• Additional Support:
https://www.w3schools.com/html/
Roadmap to learn (part 3):
 Step 3: Learn CSS
• Focus on styling HTML content. Cover basics like colors, fonts, layout
(box model, flexbox), and responsive design.
• Estimated Time: 4 weeks @ 1 hour daily
• Option 1 (Complete Coding with Prashant Sir – YouTube –
Urdu):
https://www.youtube.com/watch?v=OpWjt_wbV4E
• Option 2 (CodeWithHarry – YouTube – Urdu):
https://www.youtube.com/watch?v=Edsxf_NBFrw&t=501s
• Additional Support:
https://www.w3schools.com/css/
Roadmap to learn (part 4):
 Step 4: Learn JavaScript
• Learn JavaScript fundamentals: variables, functions, and basic DOM
manipulation to make web pages interactive.
• Estimated Time: 5 weeks @ 1 hour daily
• Option 1 (Complete Coding with Prashant Sir – YouTube –
Urdu):
https://www.youtube.com/watch?v=cpoXLj24BDY
• Option 2 (CodeWithHarry – YouTube – Urdu):
https://www.youtube.com/playlist?list=PLu0W_9lII9ahR1blWX
xgSlL4y9iQBnLpR
• Additional Support:
https://www.w3schools.com/js/default.asp
Roadmap to learn (part 5):
 Step 5: Learn Flask
• Start with setting up a simple Flask web server. Learn how to create routes,
templates, and basic form handling to connect Python with HTML.
• Estimated Time: 4 weeks @ 1 hour daily
• Option 1 (CodeWithHarry – YouTube – Urdu):
https://www.youtube.com/watch?v=oA8brF3w5XQ

• Option 2: (Tech With Tim – YouTube - English):


https://www.youtube.com/watch?v=GQcM8wdduLI&list=PLzMcBGfZ
o4-nK0Pyubp7yIG0RdXp6zklu

• Option 3: (Tech With Tim – YouTube - English):


https://www.youtube.com/watch?v=mqhxxeeTbu0&list=PLzMcBGfZ
o4-n4vJJybUVV3Un_NFS5EOgX&index=1
Roadmap to learn (part 6):
 Step 6: Learn SQL and SQLite:
• Learn SQL basics and how to perform CRUD (Create, Read, Update,
Delete) operations. Practice integrating SQLite with Flask to store
data.
• Estimated Time: 4 weeks @ 1 hour daily
 SQL (Apna College – YouTube - Urdu):
https://www.youtube.com/watch?v=hlGoQC332VM&t=103s
 SQLite (Kite – YouTube – English)
https://www.youtube.com/watch?v=girsuXz0yA8
Roadmap to learn (part 6):
 Step 7: Learn Whoosh
• Focus on search indexing and querying techniques. Begin with small
examples to create and query a search index, then integrate this
knowledge with Flask.
• Estimated Time: 4 weeks @ 1 hour daily
• Search on YouTube “Whoosh Python” for tutorials
 Total Estimated Time: 7-8 months
Roadmap to develop (part 1):
 Step 1: Reviewing and Updating Packages:
• Make sure all necessary packages and libraries (Flask, Whoosh, SQLite, etc.)
are installed and up-to-date.
• Organize project folders and files, setting up folders for templates, static files,
and configuration files.
• Set up a basic Flask app and run a "Hello, World!" web page to confirm the
environment is working.
• Estimated Time: 1 day
 Step 2: Build the Core Backend for Document Storage and Retrieval
• Document Upload: Set up an interface and backend code to allow users to
upload text files. Store the file metadata (like filename and upload date) in
SQLite.
• Text Indexing: Use Whoosh to index uploaded text files. This is where you
will create a search index to make content searchable by keywords.
• Estimated Time: 2 weeks
Roadmap to develop (part 2):
 Step 3: Develop Search Functionality (Keyword and Phrase
Search)
• Implement the keyword and phrase search functionality. Have
Whoosh handle queries and return results based on relevance.
• Order search results by relevance and prepare to display the top
matches first.
• Estimated Time: 2 weeks
 Step 4: Add Spell Checker and Custom Dictionary Features
• Integrate a spell-checking tool (e.g., TextBlob) to underline misspelled
words as users' type.
• Implement a right-click menu to display suggested spellings and
provide the option to add custom words to the user’s dictionary.
• Estimated Time: 2 weeks
Roadmap to develop (part 3):
 Step 5: Develop User Interface (UI)
• Design the main web interface using HTML, CSS, and JavaScript. Create
sections for:
• Document upload
• Search input and results
• Spell-check feedback and suggestions
• Ensure the UI is intuitive, with a clean layout and accessible elements.
• Estimated Time: 2 week
 Step 6: Implement Access Control and User Management
• Create a user registration and login system (for document upload access).
• Develop an admin interface to manage users and uploaded documents.
• Ensure metadata, including usernames, is stored alongside uploaded
documents.
• Estimated Time: 2 weeks
Roadmap to develop (part 4):
 Step 7: Add Data Persistence and Session Management
• Ensure user sessions persist across actions (e.g., staying logged in
across different pages).
• Finalize data management techniques to maintain access to uploaded
documents and user-specific data.
• Estimated Time: 1 week
 Step 8: Testing and Debugging
• Test the app’s functionalities, focusing on search accuracy, spell-
check performance, and UI responsiveness.
• Fix any bugs and optimize the code to improve search efficiency and
UI speed.
• Estimated Time: 1 weeks
Roadmap to develop (part 6):

 Step 9: Project Finalization and Deployment


• Add final touches, such as documentation and usage instructions.
• Estimated Time: 1 week
 Total Development Time: 13 weeks (approx. 3 months)
Learn from ChatGPT

 https://chatgpt.com/
 A great learning tool
 You can ask questions related to anything (including code)
 Ask it to write code, explain code, edit code etc.
 Ask it to provide helping material related to anything
 Ask it to teach you anything…
(for example: how to connect with Python application with database)
 Ask it to guide to related to anything
Things to Do:
 Learn daily – Consistency is key to the Success
 Stay connected with your Supervisor
 Check your VU email daily
Things to avoid:
 Don’t hire someone to develop the project or assignments
 You won’t be able to pass the viva if you don’t learn
 Don’t waste time – Each wasted day is a step towards the
failure
 Learning takes considerable time; you can’t learn the entire
project in 2-3 days.
Structure of the Project:
 Four Assignments:
 SRS (15 marks)
 Design Document (25 marks)
 Prototype (includes viva) (10 marks)
 Final Deliverable (includes viva) (50 marks)
 You won’t be able to submit the final deliverable if you obtain
less than 50% marks in first three assignments
Any Questions?
Thank You!
Happy Learning

You might also like