Code GenerationClaude / ChatGPT

Python Web Scraper Builder

Generate a robust Python web scraper with error handling, rate limiting, and data export to CSV or JSON.

The Prompt

Build a Python web scraper for this use case:

Target website type: [e.g., e-commerce products, news articles, job listings, real estate]
Data to extract: [list specific fields, e.g., title, price, URL, date, description]
Pagination: [yes/no — if yes, describe how pagination works]
JavaScript rendering required: [yes (use Playwright) / no (use requests+BeautifulSoup)]
Output format: [CSV / JSON / both]
Rate limiting: [gentle — 1–2 second delay between requests]

Build a complete scraper with:
1. Main scraper class with:
   - Session management with rotating user agents
   - Configurable delay between requests
   - Retry logic with exponential backoff (3 attempts)
   - Proxy support (optional, commented out)

2. Data extraction functions for each field with:
   - Primary CSS selector
   - Fallback selector
   - Data cleaning/normalization

3. Storage class:
   - Save to CSV with proper encoding
   - Save to JSON with formatting
   - Duplicate detection

4. Main runner:
   - Command line arguments (URL, output file, max pages)
   - Progress bar with tqdm
   - Summary stats on completion

5. requirements.txt

Include comments explaining each major section.

Tips for Best Results

  • Always check robots.txt before scraping any site
  • Add a --dry-run flag to test extraction logic without saving data

Tags

Pythonweb scrapingBeautifulSoupdata extractionautomation

Related Prompts