KEMBAR78
Coding Challenge - Naver Scraping HTML | PDF | Proxy Server | Information Technology Management
0% found this document useful (0 votes)
151 views3 pages

Coding Challenge - Naver Scraping HTML

The document outlines a coding challenge to create a scalable and undetectable API for scraping product details from Naver SmartStore. The API must extract JSON data from the global variable __PRELOADED_STATE__, implement anti-detection techniques, and meet specific performance criteria. Deliverables include a hosted API link, source code, and a comprehensive README with setup and usage instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views3 pages

Coding Challenge - Naver Scraping HTML

The document outlines a coding challenge to create a scalable and undetectable API for scraping product details from Naver SmartStore. The API must extract JSON data from the global variable __PRELOADED_STATE__, implement anti-detection techniques, and meet specific performance criteria. Deliverables include a hosted API link, source code, and a comprehensive README with setup and usage instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Coding Challenge: Build a Scalable and Undetectable API for

Scraping Naver SmartStore Product Details

Objective
Your challenge is to build a scalable and undetectable API that scrapes product detail data
from smartstore.naver.com. The scraper must retrieve JSON data from the page’s global
variable __PRELOADED_STATE__.

Your scraper should be able to bypass anti-scraping mechanisms and return accurate data
in real time.

URL Schema to Target


Naver SmartStore product detail pages typically follow this structure:
https://smartstore.naver.com/{store_name}/products/{product_id}
Example:
https://smartstore.naver.com/rainbows9030/products/11102379008
From this page, the scraper must capture:
• JSON data of global variable __PRELOADED_STATE__

Requirements:
1. Scraping Logic
• Extract the raw JSON data of global variable __PRELOADED_STATE__
• Techniques to avoid detection must be implemented:
o Rotate Fingerprints and IPs
o Implement request throttling and random delays
2. API Development
• Build a REST API with an endpoint to retrieve product detail by product URL:
Example:
o GET https://your-
api.com/naver?productUrl=https://smartstore.naver.com/minibe
ans/products/8768399445

3. Tech Stack
• Use JavaScript for development.
• TypeScript is a strong plus.
4. Hosting
• Host your API via NGROK (or any publicly accessible tunnel).
• Share the link so we can test the API remotely.
• Provide a clear and complete README for local setup and usage instructions.

Test Success Criteria


To pass the test, your API must meet all of the following:

Successfully retrieve data for 1000+ products.

Maintain average latency ≤ 6 seconds per request.

Maintain error rate ≤ 5%.

Stay stable and responsive for 1 hour of continuous testing

Scraping & Proxy Notes


• Proxy: 6n8xhsmh.as.thordata.net:9999:td-customer-mrscraperTrial-country-
kr:P3nNRQ8C2
• You are free to search for free or trial proxy providers for this test.

Deliverables
• Hosted API link (e.g., Ngrok).
• Source code (preferably in a GitHub repo).
• README.md with:
o Setup instructions.
o Run/test instructions.
o Scraper explanation (evasion strategies, proxy usage, etc.).
o Example usage of your API.

You might also like