KEMBAR78
Web Automation Scraping JS Handbook Small Size | PDF | Http Cookie | World Wide Web
0% found this document useful (0 votes)
33 views19 pages

Web Automation Scraping JS Handbook Small Size

The document outlines a comprehensive course on Web Automation and Scraping with JavaScript, covering essential tools like Node.js and Puppeteer. It includes modules on automation fundamentals, scaling techniques, and tackling challenges such as proxy usage and captcha solving. Each module consists of multiple classes that provide practical knowledge and hands-on projects for real-world applications.

Uploaded by

cibeg72137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views19 pages

Web Automation Scraping JS Handbook Small Size

The document outlines a comprehensive course on Web Automation and Scraping with JavaScript, covering essential tools like Node.js and Puppeteer. It includes modules on automation fundamentals, scaling techniques, and tackling challenges such as proxy usage and captcha solving. Each module consists of multiple classes that provide practical knowledge and hands-on projects for real-world applications.

Uploaded by

cibeg72137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

ht ps:/ interactivecares.com/courseDetails/296?

ti le=Web_Automation_&_Scraping_With_JavaScript
ht ps:/ interactivecares.com/courseDetails/296?ti le=Web_Automation_&_Scraping_With_JavaScript
https://forms.gle/fS7XZpaKwd34b46b8
Module 1: Automation Knowhow
• Class 1- Introduction to Web Automation Essentials
The course begins with an overview of various aspects of
web automation. We will set up the necessary tools for the
course, including Node.js, Docker, and Linux environments.
We will also discuss how it would affect you if you were to
use windows and WSL to continue the course.

- What is web automation and scraping?


- Discussion about WSL, git, git Bash, VSCode, Chrome,
libnss and other dependencies, nodejs, and docker.
- Discussion about alternative practice methods like github
codespaces and gitpod.
- Explain why we will use Puppeteer, and what is the
difference between puppeteer, playwright, selenium and
other tools.
- Installing and running a script with puppeteer.

• Class 2- Data Collection Without a Browser


Learn how to collect data without relying on a browser using
tools like Curl and Fetch. Then explore methods such as JSDom
and Puppeteer for accessing those data programmatically.

- Why we don’t always need browser to extract data


- Use fetch and curl to extract data from a website.
- How to avoid downloading an already downloaded file.
- Use jsdom, cheerio and puppeteer to parse data from
a page.
• Class 3- Puppeteer Fundamentals
Understand how to launch a browser using Puppeteer,
connect to an already launched browser, and work in both
headless and headful modes.

- Launch a browser with puppeteer.


- Use executablePath to specify different chromium builds.
- Use userDataDir to optimize browser launch time and
preserve data.
- Difference between headless and headful mode, where
and how we can use them.
- Launch a browser from the command line and use the
connect function to connect to that browser.

• Class 4- Key Puppeteer Methods and Commands


Dive deep into essential Puppeteer methods such as click,
type, select, focus, press, exposeFunction, waitForRequest,
waitForResponse, waitForNavigation, setRequestInterception
and more.

- Discuss a basic form fillup action using focus, type and


click methods.
- Wait for the page to finish loading with help of
waitForNavigation, waitForSelector and setTimeout.
- Additionally, access nodejs functions using the help of
exposeFunction.
- Monitor and wait for network requests using
setRequestInterception, waitForRequest, waitForResponse.
• Class 5- Data Collection from Websites and Advanced
Selection
Discuss real projects involving data collection from Shopify
and Next.js-based websites. Learn the differences between
data obtained via CSS selectors and JSON responses, handle
network requests, and discuss project structure and planning.

- Discuss why we don’t always need a browser to continue


automation and scraping.
- Discussion of extracting data from NextJS websites.
- Discussion of extracting data from a shopify website.
- Basic discussion about css selectors vs json responses.
- Showcase selectors like class selectors, id selectors,
nth-child selectors and more.
- Use Xpath based selectors as an alternative css selection
method.
- Providing css selector and xpath selection practices.
- Use puppeteer specific locators to locate items faster.
- Discussion about AI data extraction tools like MLScraper,
NuExtract, browser-use, MarkupLM and the cost of using
AI to extract data.

• মিডউল েশেষ �েজক্ট িডসকাশন হেব


1. Project - Screenshot API Tool
Module 2: Scaling and Beyond
• Class 6- Pagination Techniques
Learn to handle various types of pagination, including
code-based pagination, next-page navigation, infinite
scrolling, load-more buttons, and others.

- Showcase pagination strategies for various websites such


as discord, producthunt or similar sites.
- Loading new elements with triggering scroll in infinite
pagination.
- Discussing next pagination and discussing edge cases like
disabled buttons, and number based button position.

• Class 7- Crawling Libraries and Job Queues


Get an introduction to crawling libraries like Crawlee and
Puppeteer Cluster. Learn when and how to use PQueue, BullMQ,
Keyv, Prisma in real projects for efficient web automation.

- Going through categories, sitemap and pagination to


collect target links.
- Using p-queue and bullmq to handle scraping load
with ease.
- Using a key-value storage like keyv or an ORM like prisma
to store data and skip already collected data.
- Discussion about crawlee, puppeteer-cluster and other
browser pools.
• Class 8- Blocking Requests and Optimizing
Performance
Discover how to block specific requests like ads or cookie
notices, take clean screenshots, and save bandwidth during
automation tasks.

- Blocking network requests in puppeteer to prevent ads,


fonts, images and tracking scripts from loading.
- Blocking cookie notices and chat widgets to have a clean
screenshot.
- Using existing nodejs packages to block ads without
implementing it ourselves.

Class 9- Deploying Puppeteer Projects with Docker


Run Puppeteer projects using PM2 or Docker containers.
Create APIs for small-scale scraping tasks or screenshots
and deploy them on platforms like VPS, or Vercel while
addressing Puppeteer’s limitations.

- Create a minimal API that collects screenshot and page


data for a given URL.
- SSH into a server to deploy the script directly with help
of pm2.
- Using docker and compose to run the script and deploying
to the server.
- Deploying on vercel and the limitations of such platforms
for automation and scraping related tasks.
- Discussion about limitations and why shared hosting is
not good for this kind of work.
• Class 10- Usage in Testing and Github Actions
Discuss basic methods of testing using node:test, jest,
puppeteer and playwright test. Running basic tests using
github actions. Discussing the knowledge for SQA and
other realistic jobs.

- Create a basic test to ensure the web app is performing


according to the expectation.
- Difference between node:test, jest, puppeteer, playwright
test and other testing services.
- Running the tests on github actions to ensure the feature
is not broken on production.
- Discussion about the SQA and realistic jobs where the
skills are needed.

• মিডউল েশেষ �েজক্ট িডসকাশন হেব


2. Project - Disposable Email List Hunter
3. Project - Email Extraction & Lead Generation
Module 3- Tackling Challenges
• Class 11- Proxy and VPN Usage in Automation
Understand the use of proxies and VPNs with tools like
proxy-chain, extensions and built-in methods. Learn
how to integrate proxies into your automation workflows
effectively.

- Learn the importance of proxies.


- Learn about different types of proxies, residential,
datacenter, verified proxies.
- Proxy detection service and why free and low quality
proxies should be avoided.
- Using curl with a proxy service.
- Using axios with a proxy service.
- Using the proxy-server arg to setup a proxy in puppeteer.
- Using page.authenticate to authenticate a proxy.
- Using proxy-chain to authenticate a proxy.
- Discussion about using VPN vs proxies for web automation
and scraping.

• Class 12- Captcha Solving Techniques


Explore different methods for solving captchas with or
without browser extensions. Learn about popular extensions
for Chrome as well as alternative solutions.

- Discuss the need for solving captchas, limitations and


ethical implications.
- Using chrome extensions like capsolver or 2captcha to
solve captchas.
- Using 2captcha or capsolver api to handle captcha.
- Using 2captcha or capsolver nodejs sdk to handle captcha.
- Using puppeteer-extra to handle captchas.
• Class 13- Bypassing Bot Protections
Dive into advanced techniques for bypassing bot protections
using fingerprint libraries, anti-detect browsers, and other
strategies.

- Explore services like datadome, cloudflare and perimeterX


businesses used to prevent bots, automated services
and real users.
- Using puppeteer-stealth and fingerprint-injector to spoof
the fingerprint to bypass some of the bot protection.
- Explore nstbrowser, undetect browser, and other
anti-detect browsers.
- Discuss ethical issues about bot protection and bot
protection bypass mechanics.

• মিডউল েশেষ �েজক্ট িডসকাশন হেব


4. Project - Travel Deal Finder
5. Project - Price Comparison Tool

You might also like