0% found this document useful (0 votes)

5 views110 pages

DAP Module4 1

Uploaded by

mvmeghana8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views110 pages

DAP Module4 1

Uploaded by

mvmeghana8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

Bapuji Educational Association (Regd.

)
BAPUJI INSTITUTE OF ENGINEERING AND TECHNOLOG
Post Box No. 325, Shamanur Road, Davanagere – 577 004, Karnataka, India.
(An Autonomous Institute, Affiliated to VTU, Belagavi, Approved by AICTE, New Delhi
Accredited by NAAC with ‘A’ Grade and NBA (UG Programmes), Recognized by UGC, New Delhi Under 2(F) & 12(B))

Data Analytics using Python

Module-4

Web Scraping And Numerical Analysis

Prepared by

BINDU B V
Assistant Professor Dept of MCA, BIET
Topics to be studied

• Data Acquisition by Scraping web

applications

• Submitting a form

• Fetching web pages

• Downloading web pages through form

submission

• CSS Selectors.

• NumPy Essentials: The NumPy

BINDU B V, Asst. Professor, MCA, BIET.

Need for Web Scraping

• Let’s suppose you want to get some information from a website?

• Let’s say an article from the some news article, what will you do?
• The first thing that may comein your mind is to copy
and pastethe information into your local media.
• But what if you want a large amount of data on a daily basis and as
quickly as possible.
• In such situations, copy and pastewill not work and that’s
where you’ll need web scraping.

BINDU B V, Asst. Professor, MCA, BIET.

Web Scraping

• Web scraping is a technique used to extract data from websites. It

involves fetching and parsing HTML content to gather information.

• The main purpose of web scraping is to collect and analyze

data from websites for various applications, such as research,
business intelligence, or creating datasets.

• Developers use tools and libraries like BeautifulSoup (for Python),

Scrapy, or Puppeteer to automate the process of fetching and
parsing web data.

BINDU B V, Asst. Professor, MCA, BIET

Python Libraries

• requests
• Beautiful
Soup
• Selenium

BINDU B V, Asst. Professor, MCA, BIET

Requests

•The requests module allows you to send HTTP requests

using Python.
•The HTTP request returns a Response Object with all the
response data (content, encoding, status, etc).
•Install requests with pip install requests

BINDU B V, Asst. Professor, MCA, BIET

Python script to make a simple HTTP GET
request
import requests
# Specify the URL you want to make a GET request to url =
"https://www.w3schools.com"
# Make the GET request
response = requests.get(url)
# Check if the request was successful (status code 200) if
response.status_code == 200:
# Print the content of the response
print("Response content:") print(response.text)
else:
# Print an error message if the request was not successful
print(f"Error: {response.status_code}")
import requests
# Specify the base URL
base_url = "https://jsonplaceholder.typicode.com"
# GET request
get_response = requests.get(f"{base_url}/posts/1") print(f"GET
Response:\n{get_response.json()}\n")
# POST request new_post_data = {
'title': 'New Post',
'body': 'This is the body of the new post.', 'userId': 1
}
post_response = requests.post(f"{base_url}/posts", json=new_post_data)
print(f"POST Response:\n{post_response.json()}\n")
# PUT request (Update the post with
ID 1) updated_post_data = {
'title': 'Updated Post',
'body': 'This is the updated body of the post.', 'userId': 1
}
put_response = requests.put(f"{base_url}/posts/1", json=updated_post_data)
print(f"PUT Response:\n{put_response.json()}\n")
# DELETE request (Delete the post with ID 1) delete_response =
requests.delete(f"{base_url}/posts/1")
print(f"DELETE Response:\nStatus Code: {delete_response.status_code}")

BINDU B V, Asst. Professor, MCA, BIET

Implementing Web Scraping in Python with BeautifulSoup

There are mainly two ways to extract data from a website:

• Use the API of the website (if it exists). Ex. Facebook Graph API
• Access the HTML of the webpage and extract useful information/data
from it.
Ex. WebScraping
Steps involved in web scraping

• Send an HTTP request to URL

• Parse the data which is accessed

• Navigate and search the parse tree that we created

BeautifulSoup

• It is an incredible tool for pulling out information from a webpage.

• Used to extract tables, lists, paragraph and you can also put filters to
extract information from web pages.

• BeautifulSoup does not fetch the web page for us. So we use
requests pip install beautifulsoup4
BeautifulSoup

from bs4 import BeautifulSoup

# parsing the document

soup = BeautifulSoup('''<h1>Knowx Innovations PVt Ltd</h1>''',
"html.parser")

print(type(soup))
Tag Object

• Tag object corresponds to an XML or HTML tag in the original

document.

• This object is usually used to extract a tag from the whole HTML
document.

• Beautiful Soup is not an HTTP client which means to scrap online

websites you first have to download them using the requests module
and then serve them to Beautiful Soup for scraping.
• This object returns the first found tag if your document has multiple tags with the
same name.
from bs4 import BeautifulSoup
# Initialize the object with an HTML page soup = BeautifulSoup('''
<html>
<b>RNSIT</b>
<b> Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag tag = soup.b
print(tag)
# Print the output print(type(tag))
• The tag contains many methods and attributes. And two important features of a
tag are its name and attributes.
• Name:The name of the tag can be accessed through ‘.name’ as suffix.
• Attributes: Anything that is NOT tag
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page soup = BeautifulSoup('''
<html>
<b>Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag tag = soup.b
# Print the output
print(tag.name)
# changing the tag tag.name = "Strong"
print(tag)
from bs4 import BeautifulSoup
# Initialize the object with an HTML page soup = BeautifulSoup('''
<html>
<b class=“RNSIT“ name=“knowx”>Knowx Innoavtions</b>
</html>
''', "html.parser")
# Get the tag tag = soup.b
print(tag["class"])
# modifying class
tag["class"] = “ekant" print(tag)
# delete the class attributes del tag["class"]

print(tag)
• A document may contain multi-valued attributes and can be accessed using
key-value pair.

# Import Beautiful Soup

from bs4 import BeautifulSoup
# Initialize the object with an HTML page
# soup for multi_valued attributes soup = BeautifulSoup('''
<html>
<b class="rnsit knowx">Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag tag = soup.b

print(tag["class
"])
• NavigableString Object: A string corresponds to a bit of text within a tag.
Beautiful Soup uses the NavigableString class to contain these bits of text

from bs4 import BeautifulSoup soup = BeautifulSoup('''

<html>
<b>Knowx Innovations</b>
</html>
''', "html.parser") tag = soup.b
# Get the string inside the tag string = tag.string print(string)
# Print the output print(type(string))
Find the Siblings of the tag

• previous_sibling is used to find the previous element of the given

element

• next_sibling is used to find the next element of the given element

• previous_siblings is used to find all previous element of the given

element

• next_siblings is used to find all next element of the given element

descendants generator

• descendants generator is provided by Beautiful Soup

• The .contents and .children attribute only consider a tag’s direct
children
• The descendants generator is used to iterate over all of the tag’s
children, recursively.
Example for descendants generator
from bs4 import BeautifulSoup
# Create the document
doc = "<body><b> <p>Hello world<i>innermost</i><p> </b><p> Outer
text</p><body>"
# Initialize the object with the document soup = BeautifulSoup(doc,
"html.parser")
# Get the body tag tag = soup.body
for content in tag.contents:
print(content)
for child in tag.children: print(child)
for descendant in tag.descendants: print(descendant)
Searching and Extract for specific tags With Beautiful Soup

• Python BeautifulSoup – find all class

# Import Module
from bs4 import BeautifulSoup import
requests
# Website URL
URL = 'https://www.python.org/'
# class list set class_list = set()
# Page content from Website URL
page = requests.get( URL )
# parse html content
soup = BeautifulSoup( page.content ,
'html.parser')
# get all tags
tags = {tag.name for tag in soup.find_all()}
# iterate all
tags
for tag in tags:
# find all element of tag for i in
soup.find_all( tag ):
# if tag has attribute of class
if i.has_attr( "class" ): if
len( i['class'] ) != 0:
class_list.add(" ".join( i['class']))
print( class_list )

BINDU B V, Asst. Professor, MCA, BIET

Find a particular class
html_doc = """<html><head><title>Welcome to
geeksforgeeks</title></head>
<body>
<p class="title"><b>Geeks</b></p>
<p class="body">This is an example to find a perticular class
</body> """
# import module
from bs4 import BeautifulSoup
# parse html content
soup = BeautifulSoup( html_doc , 'html.parser')
# Finding by class name c=soup.find( class_ = "body") print(c)
Search by text inside a tag

Steps involved for searching the text inside the tag:

• Import module
• Pass the URL
• Request page
• Specify the tag to be searched
• For Search by text inside tag we need to check condition to with help of string
function.
• The string function will return the text inside a tag.
• When we will navigate tag then we will check the condition with the text.
• Return text
from bs4 import BeautifulSoup import requests
# sample web page
sample_web_page = 'https://www.python.org'
# call get method to request that page page =
requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create
soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find_all('strong')
#print(child_soup) text = """Notice:"""
# we will search the tag with in which text is same as
given text
for i in child_soup: if(i.string == text):
print(i)
IMPORTANTS POINTS
• BeautifulSoup provides several methods for searching for tags based on their
contents, such as find(), find_all(), and select().
• The find_all() method returns a list of all tags that match a given filter, while the
find() method returns the first tag that matches the filter.
• You can use the text keyword argument to search for tags that contain specific
text.
Select method

• The select method in BeautifulSoup (bs4) is used to find all elements

in a parsed HTML or XML document that match a specific CSS
selector.

•The select method allows you to apply these selectors to

navigate and extract data from the parsed document easily.
CSS Selector

• Id selector (#)
• Class selector (.)
• Universal Selector(*)
• Element Selector (tag)
• Grouping Selector(,)
CSS Selector

• Id selector (#) :The ID selector targets a specific HTML element based on

its unique identifier attribute (id). An ID is intended to be unique within a
webpage, so using the ID selector allows you to style or apply CSS rules to a
particular element with a specific ID.
#header { color: blue;
font-size: 16px;
}
• Class selector (.) : The class selector is used to select and style HTML elements
based on their class attribute. Unlike IDs, multiple elements can share the same
class, enabling you to apply the same styles to multiple elements throughout the
document.
.highlight {
background-color: yellow; font-weight: bold;
}
CSS Selector
• Universal Selector (*) :The universal selector selects all HTML elements on the
webpage. It can be used to apply styles or rules globally, affecting every element.
However, it is important to use the universal selector judiciously to avoid
unintended consequences.
*{
margin: 0;
padding: 0;
}
• Element Selector (tag) : The element selector targets all instances of a specific
HTML element on the page. It allows you to apply styles universally to elements
of the same type, regardless of their class or ID.
p{
color: green; font-size: 14px;
}
• Grouping Selector(,) : The grouping selector allows you to apply the same
styles to multiple selectors at once. Selectors are separated by commas, and the
styles specified will be applied to all the listed selectors.
h1, h2, h3 {
font-family: 'Arial', sans-serif; color: #333;
}

•These selectors are fundamental to CSS and provide a

powerful way to target and style different elements on a
webpage.

BINDU B V, Asst. Professor, MCA, BIET

<!DOCTYPE html>
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<div id="content">
Creating a basic HTML <h1>Heading 1</h1>
page <p class="paragraph">This is a sample
paragraph.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<a href="https://example.com">Visit Example</a>
</div>
</body>
</html>
Scraping example using CSS selectors
from bs4 import BeautifulSoup # 4. Select by attribute link =
Html=request.get((“web.html”) soup.select('a[href="https://example.co
soup = BeautifulSoup(Html, 'html.parser') m"] ')
# 1. Select by tag name heading =
print("4. Link:", link[0]['href'])
soup.select('h1')
print("1. Heading:", heading[0].text) # 5. Select all list items list_items =
# 2. Select by class soup.select('ul li') print("5. List Items:")
paragraph = soup.select('.paragraph') print("2. for item in list_items: print("-", item.text)
Paragraph:", paragraph[0].text)
# 3. Select by ID
div_content = soup.select('#content') print("3.
Div Content:", div_content[0].text)
Selenium
• Selenium is an open-source testing tool, which means it can be
downloaded from the internet without spending anything.

• Pip install selenium

Steps in form filling

• Import the webdriver from selenium

• Create driver instance by specifying

browser

• Find the element

• Send the values to the elements

• Use click function to submit

Webdriver

• WebDriver is a powerful tool for automating web browsers.

• It provides a programming interface for interacting with web

browsers and performing various operations, such as clicking
buttons, filling forms, navigating between pages, and more.

• WebDriver supports multiple programming languages

from selenium import webdriver
Creating Webdriver instance
• You can create the instance of webdriver by using class webdriver and a
browser which you want to use
• Ex: driver = webdriver.Chrome()
• Browsers:
– webdriver.Chrome()
– webdriver.Firefox()
– webdriver.Edge()
– webdriver.Safari()
– webdriver.Opera()
– webdriver.Ie()
Find the element

• First you need get the form using function get()

• To find the element you can use find_element() by specifying any
of the fallowing arguments
—XPATH
—CSS Selector
XPATH
CSS Selector
from selenium import webdriver import time
from selenium.webdriver.common.by import By
# Create a new instance of the Chrome driver driver =
webdriver.Chrome() driver.maximize_window()
time.sleep(3)
# Navigate to the form page
driver.get('https://www.confirmtkt.com/pnr-status')
# Locate form elements
pnr_field = driver.find_element("name", "pnr")
submit_button = driver.find_element(By.CSS_SELECTOR,
'.col-xs-4')
# Fill in form fields pnr_field.send_keys('4358851774')
# Submit the form
submit_button.click()
Downloading web pages through form submission
from selenium import webdriver import time
from selenium.webdriver.common.by import By
# Create a new instance of the Chrome driver driver =
webdriver.Chrome() driver.maximize_window()
time.sleep(3)
# Navigate to the form page
driver.get('https://www.confirmtkt.com/pnr-status')
# Locate form elements
pnr_field = driver.find_element("name", "pnr")
submit_button = driver.find_element(By.CSS_SELECTOR,
'.col-xs-4')
# Fill in form fields pnr_field.send_keys('4358851774')
# Submit the form submit_button.click()
welcome_message =
driver.find_element(By.CSS_SELECTOR,".pnr-card")
# Print or use the scraped values print(type(welcome_message))
html_content = welcome_message.get_attribute('outerHTML')
# Print the HTML content print("HTML Content:", html_content)
# Close the browser
driver.quit()

BINDU B V, Asst. Professor, MCA, BIET

A Python Integer Is More Than Just an
Integer
Every Python object is simply a cleverly
disguised C structure, which contains not
only its value, but other information as well.

X = 10000

X is not just a “raw” integer. It’s actually

a pointer to a compound C structure,
which contains several values.

Difference between C and Python

Variable
BINDU B V, Asst. Professor, MCA, BIET
A Python List Is More Than Just a List

BINDU B V, Asst. Professor, MCA, BIET

A Python List Is More Than Just
a List
Because of Python’s dynamic typing, we can even create
heterogeneous lists:
In the special case that all variables are of the same type, much of this
information is redundant: it can be much more efficient to store data in a
fixed-type array. The difference between a dynamic-type list and a fixed-
type (NumPy-style) array is illustrated in Figure.
Fixed-Type Arrays in Python

• Python offers several different options for storing data in efficient, fixed-
type data buffers. The built-in array module (available since Python 3.3) can
be used to create dense arrays of a uniform type:

While Python’s array object provides efficient storage of array-based data,

NumPy adds to this efficient operations on that data.
Creating Arrays from Python Lists
import numpy as
np

NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will
upcast if possible

If we want to explicitly set the data type of the resulting array, we can use the dtype
keyword:
Creating Arrays from Python Lists
• NumPy arrays can explicitly be multidimensional; here’s one way of
initializing a multidimensional array using a list of lists:
Creating Arrays from Scratch
NumPy Standard Data Types
• While constructing an array, you can specify them using a
string:

• Or using the associated NumPy

object:
The Basics of NumPy Arrays
The Basics of NumPy Arrays
We’ll cover a few categories of basic array manipulations here:
• Attributes of arrays
Determining the size, shape, memory consumption, and data types of
arrays
• Indexing of arrays
Getting and setting the value of individual array elements
• Slicing of arrays
Getting and setting smaller subarrays within a larger array
• Reshaping of arrays
Changing the shape of a given array
• Joining and splitting of arrays
Combining multiple arrays into one, and splitting one array into many
NumPy Array Attributes
NumPy Array Attributes
• Each array has attributes
ndim (the number of dimensions)
shape (the size of each dimension)
size (the total size of the array)
Write a Python program that creates a mxn integer arrayand Prints its attributes using
Numpy
Output:
Array Indexing: Accessing Single Elements
In a multidimensional array, you access items using a comma-separated tuple of indices:
You can also modify values using any of the above index
notation:

NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a
floating-point value to an integer array, the value will be silently truncated.
Array Slicing: Accessing
One-dimensional
Subarrays
subarrays
Multidimensional subarrays
Subarray dimensions can even be reversed together:
Accessing array rows and columns
Subarrays as no-copy views

Now if we modify this subarray, we’ll see

that the original array is changed! Observe:
Creating copies of arrays
Reshaping of Arrays
Another useful type of operation is reshaping of arrays. The most flexible way of
doing this is with the reshape() method. For example, if you want to put the
numbers 1 through 9 in a 3×3 grid, you can do the following:
• Note that for this to work, the size of the initial array must match the size of
the reshaped array.

• The reshape method will use a no-copy view of the initial array, but with
noncontiguous memory buffers this is not always the case.
Another common reshaping pattern is the conversion of a one-dimensional
array into a two-dimensional row or column matrix.
• Reshaping can be done with the reshape method, or more easily by making use
of the
newaxis keyword within a slice operation.
Array Concatenation and Splitting
• Concatenation of arrays

 Concatenating m ore than two arrays at

once:
np.concatenate can also be used for two-dimensional
arrays
For working with arrays of mixed dimensions, it can be clearer to use the
np.vstack
(vertical stack) and np.hstack (horizontal stack) functions:
Splitting of arrays
• The opposite of concatenation is splitting, which is implemented by the functions np.split,
np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

N split points lead to N + 1

subarrays.
Computation on NumPy Arrays: Universal Functions
• NumPy is so important in the Python data science world. It provides an easy and
flexible interface to optimized computation with arrays of data.

• Computation on NumPy arrays can be very fast, or it can be very slow. The key
to making it fast is to use vectorized operations, generally implemented through
NumPy’s universal functions (ufuncs).

• NumPy’s ufuncs can be used to make repeated calculations on array elements

much more efficient.
The Slowness of Loops
Each time the reciprocal is computed, Python first examines the object’s type
and does a dynamic lookup of the correct function to use for that type. If
we were working in compiled code instead, this type specification would be
known before the code exe‐ cutes and the result could be computed much
more efficiently.
• For many types of operations, NumPy provides a convenient interface into
this kind of statically typed, compiled routine. This is known as a vectorized
operation.

• This vectorized approach is designed to push the loop into the compiled layer
that underlies NumPy, leading to much faster execution.
• Looking at the execution time for our big array, we see that it completes
orders of magnitude faster than the Python loop:
Introducing UFuncs
• Vectorized operations in NumPy are implemented via ufuncs, whose main
purpose is to quickly execute repeated operations on values in NumPy arrays.

• Ufuncs are extremely flexible—before we saw an operation between a scalar

and an array, but we can also operate between two arrays:
• ufunc operations are not limited to one-dimensional arrays—they can
act on multidimensional arrays as well:
Exploring NumPy’s UFuncs
• Ufuncs exist in two flavors:
— unary ufuncs, which operate on a single input
— binary ufuncs, which operate on two inputs.

• We’ll see examples of both these types of functions here

with-
— Array arithmetic
— Absolute value
— Trigonometric functions
— Exponents and logarithms
Array arithmetic
• NumPy’s ufuncs feel very natural to use because they make use of
Python’s native arithmetic operators. The standard addition, subtraction,
multiplication, and division can all be used:
• There is also a unary ufunc for negation, a ** operator for exponentiation, and a % operator for modulus:

All of these arithmetic operations

are simply convenient wrappers
around specific functions built into
NumPy; for example, the + operator
is a wrapper for the add function.
Absolute value
• The corresponding NumPy ufunc is np.absolute, which is also available under
the alias np.abs:
Trigonometric functions
• NumPy provides a large number of useful ufuncs, and some of the most
useful for the data scientist are the trigonometric functions.
Exponents and logarithms
Advanced Ufunc Features
Few specialized features of ufuncs
are
• Specifying output
• Aggregates
• Outer products
Specifying output
• For large calculations, it is sometimes useful to be able to specify the array
where the result of the calculation will be stored. Rather than creating a
temporary array, you can use this to write computation results directly to the
memory location where you’d like them to be. For all ufuncs, you can do this
using the out argument of the function:
we can write the results of a computation to every other element of a specified
array:

If we had instead written y[::2] = 2 ** x, this would have resulted in the

creation of a temporary array to hold the results of 2 ** x
Aggregates
• For binary ufuncs, there are some interesting aggregates that can be computed
directly from the object. we can use the reduce method of any ufunc can do
this.

• A reduce method repeatedly applies a given operation to the elements of an

array until only a single result remains.
• For example, calling reduce on the add ufunc returns the sum of all
elements in the array:

BINDU B V, Asst. Professor, MCA, BIET.

calling reduce on the multiply ufunc results in the product of all array
elements:

to store all the intermediate results of the

computation

Note that for these particular cases, there are dedicated NumPy functions to compute
the results (np.sum, np.prod, np.cumsum, np.cumprod)

BINDU B V, Asst. Professor, MCA, BIET

Outer products
• Finally, any ufunc can compute the output of all pairs of two different inputs
using the outer method. This allows you, in one line, to do things like create a
multiplication table:

BINDU B V, Asst. Professor, MCA, BIET

Broadcasting
Broadcasting in NumPy is a powerful mechanism that allows for the arithmetic operations on arrays
of different shapes and sizes, without explicitly creating additional copies of the data. It simplifies
the process of performing element-wise operations on arrays of different shapes, making code
more concise and efficient.

Here are the key concepts of broadcasting in NumPy:

•Shape Compatibility: Broadcasting is possible when the dimensions of the arrays involved are
compatible. Dimensions are considered compatible when they are equal or one of them is 1. NumPy
automatically adjusts the shape of smaller arrays to match the shape of the larger array during the
operation.
•Rules of Broadcasting: For broadcasting to occur, the sizes of the dimensions must either be the
same or one of them must be 1. If the sizes are different and none of them is 1, then broadcasting is
not possible, and NumPy will raise a ValueError.

BINDU B V, Asst. Professor, MCA, BIET

• Automatic Replication: When broadcasting, NumPy automatically replicates the smaller array along the
necessary dimensions to make it compatible with the larger array. This replication is done without actually
creating multiple copies of the data, which helps in saving memory.
Example:
Suppose you have a 2D array A of shape (3, 1) and another 1D array B of shape (3). Broadcasting allows you
to add these arrays directly, and NumPy will automatically replicate the second array along the
second dimension to match the shape of the first array.

import numpy as np array([[5, 6, 7],

A = np.array([[1], [2], [3]]) [6, 7, 8],
[7, 8, 9]])
B = np.array([4, 5, 6])
result = A + B # Broadcasting occurs here

BINDU B V, Asst. Professor, MCA, BIET

THANK YOU

Python Module-4
No ratings yet
Python Module-4
109 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Unit I
No ratings yet
Unit I
12 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Notes For Web Scraping - BeautifulSoup-3903
No ratings yet
Notes For Web Scraping - BeautifulSoup-3903
6 pages
Retrieving Data From The Web
No ratings yet
Retrieving Data From The Web
9 pages
Webscraping
No ratings yet
Webscraping
12 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Web Scraping Cheat Sheet 2.0
No ratings yet
Web Scraping Cheat Sheet 2.0
3 pages
Web Scraping CheatSheet Guide
No ratings yet
Web Scraping CheatSheet Guide
10 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Web Scarpping
No ratings yet
Web Scarpping
4 pages
Cheat Sheet For API's and Data Collection
No ratings yet
Cheat Sheet For API's and Data Collection
4 pages
Lesson 4 Unstructured Data
No ratings yet
Lesson 4 Unstructured Data
20 pages
Web Scraping & API Guide
No ratings yet
Web Scraping & API Guide
24 pages
Quick Guide Web Scraping With Python
No ratings yet
Quick Guide Web Scraping With Python
3 pages
BeautifulSoup HTML Parsing Guide
No ratings yet
BeautifulSoup HTML Parsing Guide
9 pages
Download
No ratings yet
Download
4 pages
Apuntes Curso
No ratings yet
Apuntes Curso
2 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Web Scraping and HTML Basics
No ratings yet
Web Scraping and HTML Basics
4 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
Python Selenium Web Scraping Guide
No ratings yet
Python Selenium Web Scraping Guide
14 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Scraping
No ratings yet
Scraping
6 pages
055-En
No ratings yet
055-En
2 pages
Webscraping1 1 PDF
No ratings yet
Webscraping1 1 PDF
10 pages
BeautifulSoup Notes
No ratings yet
BeautifulSoup Notes
22 pages
How To Scrape Websites With Python and BeautifulSoup PDF
100% (2)
How To Scrape Websites With Python and BeautifulSoup PDF
10 pages
A Simple Python Web Crawler...
100% (1)
A Simple Python Web Crawler...
5 pages
Course Notes - Web Scraping and API Fundamentals in Python
No ratings yet
Course Notes - Web Scraping and API Fundamentals in Python
10 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
RajSingh WIexp4
No ratings yet
RajSingh WIexp4
7 pages
Beautiful Soup Tutorial
100% (2)
Beautiful Soup Tutorial
56 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
3252 Ids 10
No ratings yet
3252 Ids 10
5 pages
Lecture03 Data II
No ratings yet
Lecture03 Data II
42 pages
Python Web Scraping Guide
100% (1)
Python Web Scraping Guide
13 pages
Basic Scraping Techniques
No ratings yet
Basic Scraping Techniques
7 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Web Scraping Python Tutorial - How To Scrape Data From A Website
No ratings yet
Web Scraping Python Tutorial - How To Scrape Data From A Website
19 pages
Api and Data Structure
No ratings yet
Api and Data Structure
3 pages
Ibm Python Module 5 Apis Data Collection
No ratings yet
Ibm Python Module 5 Apis Data Collection
3 pages
Beautiful Soup
No ratings yet
Beautiful Soup
7 pages
Web Scrapping
100% (1)
Web Scrapping
20 pages
SIWES Report: Web Development at Akwa-Ibom E-Library
No ratings yet
SIWES Report: Web Development at Akwa-Ibom E-Library
16 pages
Website Report
No ratings yet
Website Report
12 pages
PROTECT - Essential Eight Maturity Model (November 2022)
No ratings yet
PROTECT - Essential Eight Maturity Model (November 2022)
21 pages
Syllabus
No ratings yet
Syllabus
12 pages
XML For Dummies
No ratings yet
XML For Dummies
34 pages
Android Notes
No ratings yet
Android Notes
22 pages
Javascript
No ratings yet
Javascript
143 pages
Web Technologies
No ratings yet
Web Technologies
59 pages
CAPE OF GOOD HOPES Front
No ratings yet
CAPE OF GOOD HOPES Front
69 pages
Networking Evolution & Protocols
No ratings yet
Networking Evolution & Protocols
117 pages
HTML Interview Questions & HTML5 Interview Questions in 2021
No ratings yet
HTML Interview Questions & HTML5 Interview Questions in 2021
93 pages
Chapter 1 - Introduction - Client-Side Web Development
No ratings yet
Chapter 1 - Introduction - Client-Side Web Development
22 pages
JavaScript Guide for Learners
No ratings yet
JavaScript Guide for Learners
145 pages
Ankita 4
No ratings yet
Ankita 4
70 pages
Web Tech Lab: JavaScript Basics
No ratings yet
Web Tech Lab: JavaScript Basics
4 pages
JavaScript DOM
No ratings yet
JavaScript DOM
18 pages
Durgasoft HTML
100% (2)
Durgasoft HTML
22 pages
Web Apps: Single Page Architecture
No ratings yet
Web Apps: Single Page Architecture
80 pages
TCS Training Course Schedule 2012
No ratings yet
TCS Training Course Schedule 2012
3 pages
Unit 3 Accessiblity
No ratings yet
Unit 3 Accessiblity
17 pages
Web Technologies Overview Guide
No ratings yet
Web Technologies Overview Guide
40 pages
Delivery v6.1 en
No ratings yet
Delivery v6.1 en
301 pages
HTML & CSS Basics Guide
No ratings yet
HTML & CSS Basics Guide
23 pages
JavaScript Basics for Beginners
No ratings yet
JavaScript Basics for Beginners
24 pages
Realtime Location Tracking
No ratings yet
Realtime Location Tracking
4 pages
Service Oriented Architecture
0% (1)
Service Oriented Architecture
126 pages
Iframes in HTML
No ratings yet
Iframes in HTML
7 pages
I T Skills Record
No ratings yet
I T Skills Record
23 pages
HTML Basics for Beginners
No ratings yet
HTML Basics for Beginners
3 pages

DAP Module4 1

Uploaded by

DAP Module4 1

Uploaded by

Bapuji Educational Association (Regd.

Data Analytics using Python

Web Scraping And Numerical Analysis

• Data Acquisition by Scraping web

• Fetching web pages

• Downloading web pages through form

• NumPy Essentials: The NumPy

BINDU B V, Asst. Professor, MCA, BIET.

• Let’s suppose you want to get some information from a website?

BINDU B V, Asst. Professor, MCA, BIET.

• Web scraping is a technique used to extract data from websites. It

• The main purpose of web scraping is to collect and analyze

• Developers use tools and libraries like BeautifulSoup (for Python),

BINDU B V, Asst. Professor, MCA, BIET

BINDU B V, Asst. Professor, MCA, BIET

•The requests module allows you to send HTTP requests

BINDU B V, Asst. Professor, MCA, BIET

BINDU B V, Asst. Professor, MCA, BIET

There are mainly two ways to extract data from a website:

• Send an HTTP request to URL

• Parse the data which is accessed

• Navigate and search the parse tree that we created

• It is an incredible tool for pulling out information from a webpage.

from bs4 import BeautifulSoup

# parsing the document

• Tag object corresponds to an XML or HTML tag in the original

• Beautiful Soup is not an HTTP client which means to scrap online

# Import Beautiful Soup

from bs4 import BeautifulSoup soup = BeautifulSoup('''

• previous_sibling is used to find the previous element of the given

• next_sibling is used to find the next element of the given element

• previous_siblings is used to find all previous element of the given

• next_siblings is used to find all next element of the given element

• descendants generator is provided by Beautiful Soup

• Python BeautifulSoup – find all class

BINDU B V, Asst. Professor, MCA, BIET

Steps involved for searching the text inside the tag:

• The select method in BeautifulSoup (bs4) is used to find all elements

•The select method allows you to apply these selectors to

• Id selector (#) :The ID selector targets a specific HTML element based on

•These selectors are fundamental to CSS and provide a

BINDU B V, Asst. Professor, MCA, BIET

• Pip install selenium

• Import the webdriver from selenium

• Create driver instance by specifying

• Find the element

• Send the values to the elements

• Use click function to submit

• WebDriver is a powerful tool for automating web browsers.

• It provides a programming interface for interacting with web

• WebDriver supports multiple programming languages

• First you need get the form using function get()

BINDU B V, Asst. Professor, MCA, BIET

X is not just a “raw” integer. It’s actually

Difference between C and Python

BINDU B V, Asst. Professor, MCA, BIET

While Python’s array object provides efficient storage of array-based data,

• Or using the associated NumPy

Now if we modify this subarray, we’ll see

 Concatenating m ore than two arrays at

N split points lead to N + 1

• NumPy’s ufuncs can be used to make repeated calculations on array elements

• Ufuncs are extremely flexible—before we saw an operation between a scalar

• We’ll see examples of both these types of functions here

All of these arithmetic operations

If we had instead written y[::2] = 2 ** x, this would have resulted in the

• A reduce method repeatedly applies a given operation to the elements of an

BINDU B V, Asst. Professor, MCA, BIET.

to store all the intermediate results of the

BINDU B V, Asst. Professor, MCA, BIET

BINDU B V, Asst. Professor, MCA, BIET

Here are the key concepts of broadcasting in NumPy:

BINDU B V, Asst. Professor, MCA, BIET

import numpy as np array([[5, 6, 7],

BINDU B V, Asst. Professor, MCA, BIET

You might also like