0 ratings0% found this document useful (0 votes) 149 views14 pagesBeginner Guide To Web Scraping of Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
2ari12023, 15:38 Getting started with web Scrapping | by Dennis | Getting started with Web Scrapping | Medium
Openin app 7
a
> Get unlimited access to the best of Medium for less than $1/week. Become amember x
Getting started with web Scrapping
Dennis - Following
'® Published in Getting started with Web Scrapping
Sminread Apr 14,2020
© Listen (1) Share + More
We all know that at this time and age data is everything: The internet as we know it
has become a one stop shop for any kind of data you need ranging from
text,sound,video etc.But the question still remains, how do we get this data?
A big percentage of this data resides on web pages and this is where the idea of web
scrapping comes in which in a nutshell is the ability to extract data from this web
pages.Well I am gonna be showing you how easy itis to achieve this by use of
python,
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb ane2arvi2029, 15:39 ttn started wih wab Scraping | by Dennis | Gotingstartod with Web Scrapping | Medium
The first step to this is identifying which web page or website you want to extract
data from and what kind of data do you want.In this example we are going to be
extracting image and text data from an e-commerce site.
We are going to make use of python as the programming language and
beautifulsoup4 which is a library that makes it easier to scrap information from web
pages. will be making the assumption that you have the basic python
understanding to be able to follow along and gain the most out of this.
LETS GET OUR HANDS DIRTY
To start with , we will be installing beautifulsoup4 using pip3 with is a python
package manager-To do this run the following command.
$ pip install beautifulsoup4
Next create a python file and name it as you wish.For this example we will call it
tbcsrapper.
$ touch tbescrapper.py
Open it with your favorite text editor and lets get to the fun part, shall we?
Import the required libraries
‘import urllib. request
from bs4 import BeautifulSoup
Next create the scrapping logic.There are a number of approaches you could take
but the first step is to always inspect the html of the page you want to scrap data
from.This way you will have an understanding of the dom and know exactly which
html elements to target.In this case we are scrapping this
hittps://textbookcentre.com/ site for the images.
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb anearis, 1838 Ceting started wih web Scraping | by Dennis | eting stated wih Web Scrapping | Medium
On inspecting the website,we find that the books images are under a div with the
class named prod-list-view
Primary School
toe
bediv class='col-xs-12 col-sn-3">.
.
.
startscrapping.py hosted with @ by GitHub view raw
Let’s also create the method responsible for creating a directory to store the
scrapped data
|ntps:fmedium.comigeting-started-with-web-scrapping/geting-started-wit-web-scrapping-cA274tBa5eb sre2ayii2023, 15:98 Getting started with web Scrapping | by Dennis | Geting started with Web Scrapping | Medium
2 def createdirectory(sel*,path):
3 4# Create directory to store the images
4 try:
08 .mkdin(path)
6 except Exception as ez
7 print ("Creation of the directory failed",e )
8 else:
° print ("Successfully created the directory %s * % path)
10
createdirectory.py hosted with @ by GitHub view raw
To run the scrapper we will need to initialize it and call the initializescrapper
method and pass the url we want to scrap.At the bottom of the script add this code
url = "https: //textbookcentre.con/catalogue/category/text-books/primary-school/"
1
2 def main()
3
a
Scrapper = Scrapper()
5 Serapper.initializeserapping(url)
6 46 name, main
main()
runpy hosted with @ by GitHub View raw
Run the code now from your terminal by calling python then the filename
python3 tbescrapper.py
There you have it you have your first web scrapper-The full code base of the script
looks like this
|ntps:fmedium.comigeting-started-with-web-scrapping/geting-started-wit-web-scrapping-cA274tBa5eb ana,2ayii2023, 15:98
1
2
10
2
13
4
15
16
v
18
9
20
a
2
23
24
25
26
27
28
29
30
at
32
33
34
35
36
37
38
39
40
an
a2
3
44
45
46
a7
Getting started with web Scrapping | by Dennis | Geting started with Web Scrapping | Medium
inport urllib.request
from bs4 import Beautifulsoup
class Serapper():
def initializeScrapping(self,url)
4 Set the url of the page you want to scrap for data\
urlpage = url
# Using urilib open the page
page = urllib.request -urlopen(urlpage)
arse the webpage
soup Beautifulsoup(page, ‘htnl.parser*)
4 Get the page data from the div with a class of product List view
rod-List-view")
producet_list = soup.find( ‘div’ ,class_-
# Traverse the DOM
items = producet_list.find( 'section')
book List = items.find(‘ol',class_="product-list row”)
book data = book_list.FindAll(‘li",class_="col-xs-6 col-nd-3")
Len(book_data)
rnunber_of_books
self. startScrapping(number_of books, book_data)
def startScrapping(self, itens,book_data):
Get the current working directory
current_directory = os.getcwd()
# Create a folder named books to store the srapped images
path = os.path. join(current_directory, r"books")
self.createvirectory(path)
counter = 2
Loop through the product List
for book in book data
try:
product = book. find( ‘div’ ,class_="product")
url = product. find(*a')
full_url = url.get(*hret*)
page=urllib.request .urlopen( ‘https://textoookcentre. con’ +full_url)
soup = SeautifulSoup(page, ‘html.parser')
data = soup.find( ‘article’ ,class_
find( ‘div’ ji
image = image.find(*a")
product_page")
finage = 6 product-inages")
Amage_url = inage.get(*href")
cet the title of the book so as the save each book with its title
title data = data. Find( ‘div’ ,class_="col-sn-6 product_nain')
title = title data-find(‘h1')
fullpath = 05.path.join(path, title. text)
save the book
urlLib-request.urlretrieve( ‘https: //textbookcentre.con' rinage_url, " {)/{}.Jpg" format (p2
AF counter
items:
Intps:fmedium.comigeting-started-with-web-scrapping/geting-started-witn-web-scrapping-cA274tBa5eb ma2ayii2023, 15:38 Getting started with web Scrapping | by Dennis | Getting started with Web Scrapping | Medium
49 return counter
50 else:
sa print(‘INFO: saved {} {)'.format(title. text, counter))
52 counter +=1
33 except Exception as e:
sa print ("ERROR:',e)
35
56 def createdirectory(self, path):
37 # Create directory to store the images
38 try:
59 0s.mkdir(path)
60 except Exception as e:
6a print ("Creation of the directory failed",e )
oe else:
6 print ("Successfully created the directory %s " % path)
6a
65
66 def main()
67 url = “https://textbookcentre. com/catalogue/category/text-books/prinary-school/"
68 scraper = Scrapper()
69 _scrapper. initializeScrapping(url)
7@ if _name_ == "_main_":
n main()
Final considerations
Thope this introduction to web scraping helps you find a solid ground to stand upon
while starting out in data gathering.Thank you!
Web Scraping Data Mining Scrapping Data Gathering Data Grabbing
Ko
Written by Dennis
|ntps:fmedium.comigeting-started-with-web-scrapping/geting-started-wit-web-scrapping-cA274tBa5eb ana2arii2023, 15:98 Gelling started with web Scrapping | by Dennis | Getting stated with Web Scrapping | Medium
14 Followers « Editor for Getting started with Web Scrapping
Python Programmer
More from Dennis and Getting started with Web Scrapping
® dennis
My Interview Experience at IcodeAl
Artificial intelligence data science , machine learning are some of the “buzz words” that guys
use a lot in conversations but something...
2min read + Mar 12,2019
Sa Q oo
C See all from Dennis )
C See all from Getting started with Web Scrapping )
|ntps:fmedium.comigeting-started-with-web-scrapping/geting-started-wit-web-scrapping-cA274tBa5eb ona2arii2023, 15:98 Gelling started with web Scrapping | by Dennis | Getting stated with Web Scrapping | Medium
Recommended from Medium
Menmy
Exciting Web Scraping Projects
Web scraping is an essential technique in the world of data extraction and analysis. It involves
automating the process of fetching data
Sminread » Aug 4
Ss Q
|ntps:fmedium.comigeting-started-with-web-scrapping/geting-started-wit-web-scrapping-cA274tBa5eb sone24112023, 15:38 Getting started with web Scrapping | by Dennis | Getting started with Web Scrapping | Medium
>
— SIMPLE
EES STEPS FOR
Select The WEB Store the
Website SCRAPING Data
f f
Ea <>—> wo
Find the data write the Script Run the Bot
@ Pankaj Pandey
Web Scraping Using Python for Dynamic Web Pages and Unve
Hidden Insights
Web Scraping
Bminread » Aug 23
S10 Qi
au
ChatGPT
22stories - 275 saves
ChatGPT prompts
30stories - 703 saves
New.Reading_List
Y?Astories . 198 saves
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb
we24112023, 15:38 Getting started with web Scrapping | by Dennis | Getting stated with Web Scrapping | Medium
=a Th
options = Options
bptions.add_argument("--head1
Service = Service -chrome_driver_path
webdriver.Chrome
driver.get(url
page source = driver.page_source
river.quit
oe alee ed-|=4-meol OT aol
@® evan Roberts
Scraping Dynamically Loaded Content with Selenium and BeautifulSoup
Table of Contents - Introduction : Setting up ChromeDriver and Chrome for Selenium ©
Windows Setup ° Mac Setup Linux / WSL Setup...
'minread » Jun23
eas Qi
Sea eee ce ILC ee)
PBST Pe
af
Besta
BBS ccre
@ sieakens
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb ane2ari12023, 15:38 Gelling started with web Scrapping | by Dennis | Getting stated with Web Scrapping | Medium
How to still scrape millions of tweets in 2023 using twscrape
Twitter is a great place to gather data and assess various trends. Many analytics teams have
used this source for their models.
Amin read» Jul1
Sa Qs oo
@ arya
Getting Started with Web Scraping Amazon Reviews with BeautifulSoup
Python web scraping introduction: How to extract Amazon Reviews using Python?
7minread - Jul24
Q Ww
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb 13a24112023, 15:39 Getting started with web Scrapping | by Dennis | Getting stated with Web Scrapping | Medium
& scraperapi
Web Scraping is comple
We make it simple
BD irene Nafula
The Power of Web Scraping APIs
In the digital age, data has become one of the most valuable
individuals alike. With the vast amount of.
ts for businesse:
4minread © Junt
C See more recommendations )
hitpssImedium.comigettng-started-with-web-scrappingigeting-started-with-web-scrapping-ed27st6a5eb 146