Government Polytechnic Thane
Institute Code : 0116
Vision
To create competent technical manpower to cater industrial and societal needs.
Mission
We are committed to –
M1 : Provide an environment that values and encourages knowledge acquisition
with effective curriculum implementation.
M2 : Maintain the well-equipped laboratories to develop industrial competencies
among the students.
M3 : Empower and motivate faculties towards building their domain expertise in
technology and management.
M4 : Groom all round personality of students towards leadership, self-
employability, and lifelong learning.
M5 : Promote Industry Institute Interaction through training and placement
services, continuing education programs, consultancy & Technical services,
etc. for socio-economic development.
Our Core Values are –
Ethics, Equity, Women Empowerment, safety and Eco-friendly practices
1
Computer Engineering Department
Vision
To be a trend setting department in technical education providing highly
competent, efficient manpower to meet the ever technology.
Mission
We are committed to –
M1 : To provide an atmosphere for students and faculty to enhance problem
solving skills, leadership qualities, team spirit & ethical responsibilities.
M2 : To develop technical & professional skills to face Evolving Challenges and
Social Needs through Innovative Learning Process.
M3 : Establish Industry institute interaction program to enhance entrepreneurship
skills.
M4 : Enabling the Students to Excel in their Professions and Careers with life-long
learning keeping speedy growth with emerging technology.
2
ASSESSMENT MANUAL
OF
PYTHON
Micro-Project
CO6I Diploma Program
MAHARASHTRA STATE BOARD OF
TECHNICAL EDUCATION, MUMBAI
(Autonomous) (Iso-9001-2008) (ISO/ICE 27001:2013)
(2022-2023)
3
Government Polytechnic Thane
CERTIFICATE
This is to certified that following Third Year Computer Engineering students
have successfully and satisfactorily completed their micro project work, entitled
“Web Scraping using Python” in partial fulfilment of the requirement for diploma
in Computer Engineering Academic Year2022-2023.
Student Name Roll No.
Manish Mhatre 03
Siddhesh Wagh 14
Tejas Gunjal 21
Sujal Desale 27
Sujit Lendave 28
Smt. Poonam Chaudhary Smt. Prajakta Mahajan Dr. D.R.Mahajan
Subject Teacher H.O.D. Principal
4
GOVERNMENT POLYTECHNIC,
THANE
Micro-Project
WEB SCRAPING USING PYTHON
SUBJECT :- PYTHON
COMPUTER ENGINEERING
GOVERNMENT POLYTECHNIC,
THANE – 400612
2022-2023
5
Annexure II
A Micro-Project Report
On
WEB SCRAPING USING PYTHON
Submitted in Practical fulfillment of
The Diploma in
Computer Engineering
UNDER GUIDENCE OF
Ms. Poonam Chaudhary
Department Of
COMPUTER ENGINEERING
Submitted By
Roll No. Name Enrollment No.
03 Manish Mhatre 2001160165
14 Siddhesh Wagh 2001160176
21 Tejas Gunjal 2001160183
27 Sujal Desale 2001160189
28 Sujit Lendave 2001160191
6
Table of Contents
Sr. No. Topics Page No.
1. Rationale 8
2. Aim/Benefits of the Micro-Project 8
3. Course Outcomes Achieved 8
4. Literature Review 8
5. Actual Procedure Followed 9
6. Actual Resources Used 9
7. Outputs of the Micro-Project 10 - 12
8. Skills Developed/Learning Outcomes of the Micro-Project 13
9. Applications of this Micro-Project 13
7
1.0 Rationale : -
Web scraping is an automated method used to extract large amounts of data from websites. Python
community has come up with some pretty powerful web scraping tools. The Internet hosts
perhaps the greatest source of information on the planet. Many disciplines, such as data science,
business intelligence, and investigative reporting, can benefit enormously from collecting and
analyzing data from websites. Python is powerful programming language and it has efficient high-
level data structures that is useful for data scraping.
2.0 Aim/Benefits of the Micro-Project :-
• The main aim of the project is to acquire the non-tabular or poorly structured data from
websites and convert it into a usable, structured format, such as a .csv file or spreadsheet.
• To learn the uses of web scraping and techniques to perform this web scraping process may
improve our data collection and analysis process.
• To learn different python libraries, and learn how to use this libraries in our project.
3.0 Course Outcomes Achieved :-
• Perform operations on data structures in python.
• Develop functions for a given problem and handling exceptions.
• Able to scraping the particular information from website
• Design classes for particular problems.
4.0 Literature Review :-
As we know Web Scraping is used for extract the data from various website for this we require
Beautiful Soup libraries. With over 10,626,990 downloads a week and 1.8K stars, Beautiful Soup
is one of the most helpful Python web scraping libraries for parsing HTML and XML documents
into a tree structure to identify and extract data. Beautiful Soup offers a Pythonic interface and
automated encoding conversions, making it easier to work with website data. Also we used
requests library, with over 52,881,567 weekly downloads, Requests is another popular Python
library that makes it easier to generate multiple HTTP requests. This is extremely helpful for web
scraping since the primary step in any web scraping process is to submit HTTP requests to the
website's server to extract the data displayed on the desired web page.
8
5.0 Actual Procedure Followed :-
To accomplish web scraping for our project, we utilized essential Python concepts such as lists
and their methods, dictionaries, arrays, and loop statements. We developed a .py file and imported
necessary libraries such as requests and bs4, which simplify the process of extracting data from
web pages by providing Pythonic idioms for parsing, searching, and modifying the HTML/XML
tree. Additionally, we employed the pandas library to efficiently handle large and complex data
sets, providing various data structures and operations for manipulating numerical data and time
series. Finally, we exported the extracted website data to an Excel sheet. This was our approach
to completing the project.
6.0 Actual Resources Used :-
Sr. Name of Resources Specifications Qty Remarks
No. Required
1. Computer System 8 GB Ram and i5 processor 1
2. MS Word Latest 1
3. IDE VS Code 1
4. Browser Chrome 1
9
7.0 Output of the Micro-Project :-
Source Code :-
10
11
Output :-
Sr. NO. Name Rating Genre Release Date Runtime Director URL
1 Ant-Man and the Wasp: Quantumania 65.0 Action, Adventure, Science Fiction 02/17/2023 2h 5m Peyton Reed https://www.themoviedb.org/movie/640146
2 The Super Mario Bros. Movie 75.0
Animation, Adventure, Family, Fantasy, Comedy
04/07/2023 1h 32m Aaron Horvath https://www.themoviedb.org/movie/502356
3 Shazam! Fury of the Gods 69.0 Action, Comedy, Fantasy 03/17/2023 2h 10m David F. Sandberg https://www.themoviedb.org/movie/594767
4 Avatar: The Way of Water 77.0 Science Fiction, Adventure, Action 12/16/2022 3h 12m James Cameron https://www.themoviedb.org/movie/76600
5 The Last Kingdom: Seven Kings Must Die 73.0 Action, Adventure, History, Drama, War 04/14/2023 1h 51m Edward Bazalgette https://www.themoviedb.org/movie/948713
6 Creed III 73.0 Drama, Action 03/03/2023 1h 56m Michael B. Jordan https://www.themoviedb.org/movie/677179
7 Murder Mystery 2 66.0 Comedy, Mystery, Action 03/28/2023 1h 31m Jeremy Garelick https://www.themoviedb.org/movie/638974
8 Evil Dead Rise 70.0 Horror, Thriller 04/21/2023 1h 36m Lee Cronin https://www.themoviedb.org/movie/713704
9 Ghosted 73.0 Romance, Action, Comedy 04/18/2023 2h Dexter Fletcher https://www.themoviedb.org/movie/868759
10 Scream VI 74.0 Horror, Mystery, Thriller 03/10/2023 2h 3m Tyler Gillett https://www.themoviedb.org/movie/934433
11 Puss in Boots: The Last Wish Animation,
83.0 Family, Fantasy, Adventure, Comedy, Drama
01/20/2023 1h 43m Joel Crawford https://www.themoviedb.org/movie/315162
12 Adrenaline 55.0 Action 12/15/2022 1h 15m Massimiliano Cerchi https://www.themoviedb.org/movie/1048300
13 65 63.0 Science Fiction, Adventure, Thriller, Action 03/10/2023 1h 33m Bryan Woods https://www.themoviedb.org/movie/700391
14 The Pope's Exorcist 66.0 Horror, Thriller 04/07/2023 1h 43m Julius Avery https://www.themoviedb.org/movie/758323
15 John Wick: Chapter 4 79.0 Action, Thriller, Crime 03/24/2023 2h 50m Chad Stahelski https://www.themoviedb.org/movie/603692
16 The Communion Girl 64.0 Horror 02/10/2023 1h 43m Víctor García https://www.themoviedb.org/movie/1008005
17
Pirates Down the Street II: The Ninjas from Across
62.0 Family, Action, Adventure 04/20/2022 1h 29m Pim van Hoeve https://www.themoviedb.org/movie/946310
18 Cocaine Bear 64.0 Thriller, Comedy, Crime 02/24/2023 1h 36m Elizabeth Banks https://www.themoviedb.org/movie/804150
19 Gangs of Lagos 56.0 Crime 04/07/2023 None Jadesola Osiberu https://www.themoviedb.org/movie/1104040
20 Kill Boksoon 69.0 Action, Crime, Thriller 03/31/2023 2h 17m Byun Sung-hyun https://www.themoviedb.org/movie/849869
21 Chupa 65.0 Adventure, Fantasy, Family 04/07/2023 1h 38m Jonás Cuarón https://www.themoviedb.org/movie/736790
22 Attack on Titan 60.0 Action, Science Fiction 09/30/2022 1h 33m Noah Luke https://www.themoviedb.org/movie/1033219
23 Supercell 63.0 Action 03/17/2023 1h 40m Herbert James Winterstern https://www.themoviedb.org/movie/842945
24 Mummies 71.0
Animation, Comedy, Family, Adventure, Fantasy
02/24/2023 1h 28m Juan Jesús García Galochahttps://www.themoviedb.org/movie/816904
25 Ripper's Revenge 44.0 Horror 04/03/2023 1h 25m Steven Lawson https://www.themoviedb.org/movie/1105014
26 Marcel the Shell with Shoes On 78.0Animation, Comedy, Family, Drama, Adventure
06/24/2022 1h 30m Dean Fleischer Camp https://www.themoviedb.org/movie/869626
27 Prizefighter: The Life of Jem Belcher 62.0 Drama, History 07/22/2022 1h 47m Daniel Graham https://www.themoviedb.org/movie/943822
28 Winnie the Pooh: Blood and Honey 57.0 Horror, Thriller 02/15/2023 1h 24m Rhys Frake-Waterfield https://www.themoviedb.org/movie/980078
29 Black Panther: Wakanda Forever 73.0 Action, Adventure, Science Fiction 11/11/2022 2h 42m Ryan Coogler https://www.themoviedb.org/movie/505642
30 The Park 59.0Action, Drama, Horror, Science Fiction, Thriller
03/02/2023 1h 20m Shal Ngo https://www.themoviedb.org/movie/1084225
31 Black Adam 71.0 Action, Adventure, Science Fiction, Fantasy 10/21/2022 2h 5m Jaume Collet-Serra https://www.themoviedb.org/movie/436270
32 M3GAN 74.0 Science Fiction, Horror, Comedy 01/13/2023 1h 42m Gerard Johnstone https://www.themoviedb.org/movie/536554
33 Black Warrant 54.0 Action, Thriller 01/09/2023 1h 34m Tibor Takács https://www.themoviedb.org/movie/983768
34 Sisu 71.0 War, Action 01/27/2023 1h 31m Jalmari Helander https://www.themoviedb.org/movie/840326
35 The Devil Conspiracy 63.0 Horror, Thriller 01/13/2023 1h 51m Nathan Frankowski https://www.themoviedb.org/movie/296271
36 The Exorcist 57.0 Horror 11/02/2022 1h 41m Adrián García Bogliano https://www.themoviedb.org/movie/1023313
37 Plane 69.0 Action, Adventure, Thriller 01/13/2023 1h 47m Jean-François Richet https://www.themoviedb.org/movie/646389
38 The Amazing Maurice 70.0
Animation, Adventure, Comedy, Family, Fantasy
02/09/2023 1h 33m Toby Genkel https://www.themoviedb.org/movie/676710
39 A Tourist's Guide to Love 65.0 Romance, Comedy, Adventure 04/21/2023 1h 36m Steven K. Tsuchida https://www.themoviedb.org/movie/813726
40 Knock at the Cabin 64.0 Horror, Mystery, Thriller 02/03/2023 1h 40m M. Night Shyamalan https://www.themoviedb.org/movie/631842
That Time
41 I Got Reincarnated as a Slime the Movie: Scarlet
76.0 Bond Animation, Fantasy, Adventure 11/25/2022 1h 48m Yasuhito Kikuchi https://www.themoviedb.org/movie/876792
42 The Mummy Resurrection 58.0 Horror 01/02/2023 1h 25m Steven Lawson https://www.themoviedb.org/movie/984105
43 Knights of the Zodiac 67.0 Fantasy, Action, Adventure 04/28/2023 1h 52m Tomek Baginski https://www.themoviedb.org/movie/455476
44 Unhappily Ever After 69.0 Comedy, Romance 01/26/2023 1h 30m Noé Santillán-López https://www.themoviedb.org/movie/676841
45 Fast X 0.0 Action, Crime, Thriller 05/19/2023 2h 21m Louis Leterrier https://www.themoviedb.org/movie/385687
46 Disquiet 67.0 Thriller, Horror, Mystery 02/10/2023 1h 25m Michael Winnick https://www.themoviedb.org/movie/1072074
47 Shotgun Wedding 63.0 Action, Comedy, Romance 01/18/2023 1h 41m Jason Moore https://www.themoviedb.org/movie/758009
48 Phenomena 59.0 Comedy, Horror, Thriller 04/14/2023 1h 36m Carlos Therón https://www.themoviedb.org/movie/1073413
49 Sayen 62.0 Thriller, Action 03/03/2023 1h 34m Alexander Witt https://www.themoviedb.org/movie/850871
50 Shark Side of the Moon 53.0 Action, Science Fiction, Thriller, Horror 08/12/2022 1h 28m Tammy Klein https://www.themoviedb.org/movie/1011679
51 Queens on the Run 71.0 Comedy, Action 04/14/2023 1h 36m Jorge Macaya https://www.themoviedb.org/movie/1101799
52 Gold Run 66.0 War, Action, Adventure, Thriller 12/15/2022 1h 57m Hallvard Bræin https://www.themoviedb.org/movie/964426
53 One More Time 63.0 Comedy, Drama 04/21/2023 1h 25m Jonatan Etzler https://www.themoviedb.org/movie/1100962
54 Batman: The Doom That Came to Gotham 65.0 Animation, Fantasy, Horror, Action, Mystery 03/28/2023 1h 26m Christopher Berkeley https://www.themoviedb.org/movie/1003579
55 Prey for the Devil 71.0 Horror, Thriller 10/28/2022 1h 33m Daniel Stamm https://www.themoviedb.org/movie/676547
56 Die Hart 61.0 Action, Comedy, Thriller 02/24/2023 1h 25m Eric Appel https://www.themoviedb.org/movie/1077280
57 Fall 73.0 Thriller 08/12/2022 1h 47m Scott Mann https://www.themoviedb.org/movie/985939
58 The Magician's Elephant 72.0 Animation, Adventure, Family, Fantasy 03/17/2023 1h 39m Wendy Rogers https://www.themoviedb.org/movie/776835
59 H.P. Lovecraft's Witch House 58.0 Horror 06/29/2022 1h 22m Bobby Easley https://www.themoviedb.org/movie/988165
12
8.0 Skills Developed/Learning Outcomes of the Micro-Project :-
▪ We use this study to improve our web scraping process, and we discovered that most of the
web scrapers are often quite similar and general in nature designed to carry out generic and
easy jobs.
▪ Able to extract website data to Excel sheet.
▪ We learn how to use python basics in our project.
9.0 Applications of this Micro-Project :-
▪ Web scraping is widely utilized for a variety of purposes, including comparing prices online,
observing changes in weather data, website change detection, research, integrating data from
multiple sources, extracting offers and discounts, scraping job postings information from job
portals, brand monitoring, and market analysis.
13