Automate Web
Scraping with
Selenium and
Python: Building a
Cronjob
https://www.linkedin.com/in/anas-arshad-125059197/
Why Cron Jobs are Important?
1. Automation: Cron jobs allow you to automate recurring tasks,
eliminating the need for manual intervention. You can schedule
scripts or commands to run at specific times, intervals, or dates
without requiring constant human intervention. This automation
saves time and effort, particularly for repetitive or time-sensitive
tasks.
2. Data Processing and Analysis: Cron jobs are frequently used for
data processing and analysis tasks. You can schedule scripts or
programs to collect, process, and analyze data at specified
intervals. This is particularly useful for tasks such as data
scraping, generating reports, performing calculations, and
updating databases. Cron jobs enable regular and automated data
processing, ensuring that your information remains up to date.
3. Task Scheduling: Cron jobs provide a flexible and reliable way to
schedule a wide range of tasks. Whether it's running scheduled
backups, sending periodic reports, fetching data from external
sources, or executing complex workflows, cron jobs allow you to
define the timing and frequency of these tasks based on your
specific needs.
https://www.linkedin.com/in/anas-arshad-125059197/
Project Kickoff
ChromeDriver download here
https://chromedriver.chromium.org/downloads
The Chrome Driver facilitates
communication between Selenium and the
Chrome browser. It acts as a middleware,
translating Selenium commands into
actions that the Chrome browser
understands and executes.
Create two files "helper.py" and "main.py" and leave
them blank for now. we will work on them later
https://www.linkedin.com/in/anas-arshad-125059197/
Add time schedule in helper.py
This block defines a function called get_time(). Within the
function, time.strftime() is used to format the current
time according to the specified format '%X (%d/%m/%y)'.
%X represents the current time in the format
'HH:MM:SS'.
(%d/%m/%y) represents the current date in the format
'DD/MM/YY'.
The formatted time is then returned by the function.
https://www.linkedin.com/in/anas-arshad-125059197/
Import Some Libraries
The schedule module in Python
webdriver provides the necessary
provides a simple and convenient It provides a set of classes
classes and methods for interacting
way to schedule and run periodic that define different
with different browsers.
tasks or functions.
strategies for locating
elements on a web page
You can use the time module to measure
the execution time of your code
this is our helper.py file which
have get_time() function
Regular expressions allow you to search for
patterns within text. You can define patterns
The MySQL Connector for Python
is a library that allows Python
using a combination of characters,
programs to connect to and metacharacters, and special sequences to
interact with MySQL databases. match specific strings, digits, or patterns of
characters.
https://www.linkedin.com/in/anas-arshad-125059197/
mysql conectivity
https://www.linkedin.com/in/anas-arshad-125059197/
Scraping Code
https://www.linkedin.com/in/anas-arshad-125059197/
Scheduling Code
schedule.every(60).seconds: This line sets up a scheduling rule using
the every().seconds syntax from the schedule module. It specifies that
the task should run every 60 seconds. This means the task will be
scheduled to run repeatedly with a 60-second interval between each
execution.
.do(task): This part specifies the task that should be executed
according to the scheduling rule. task represents the function or code
block that you want to run at the specified interval.
while True:: This starts an infinite loop to continuously check for
pending scheduled tasks and execute them.
schedule.run_pending(): Inside the loop, the run_pending() function is
called to check if there are any pending scheduled tasks that need to
be executed. If there are any, it will run those tasks. If there are no
pending tasks, it will move on without doing anything.
https://www.linkedin.com/in/anas-arshad-125059197/
IF YOU HAVE ANY QUERY
FEEL FREE TO ASK
THANK YOU
https://www.linkedin.com/in/anas-arshad-125059197/