High-Tech Web Crawler Specialist

1 week ago


India beBeeEngineer Full time ₹ 15,00,000 - ₹ 20,00,000
Job Description:">

We're looking for a seasoned web scraping engineer to join our team. Your primary responsibility will be designing and implementing a high-throughput product data ingestion pipeline that can handle hundreds of domains.

">

As a web scraping engineer, you'll be responsible for the crawling/extraction layer end-to-end, including HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).

">
  • Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
  • Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
  • Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
  • Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
  • Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
  • Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
  • Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
  • Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
  • Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
  • Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
  • Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
  • Maintain allow/deny paths; adhere to robots.txt and Terms of Service.


  • India beBeeSpecialist Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Senior Web Scraping EngineerWe are seeking an experienced Web Scraping Specialist to join our team. The successful candidate will own the crawling/extraction layer end-to-end, designing and implementing a scalable solution that meets our needs.The ideal candidate will have strong expertise in Scrapy or aiohttp and Playwright, with practical experience in...


  • India beBeeData Full time ₹ 7,50,000 - ₹ 15,00,000

    Web Crawling Engineer PositionWe are seeking a skilled Web Crawling Engineer who will be responsible for designing and implementing web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling...


  • India TripleChoice Inc Full time

    Senior Web Scraping Engineer (Python) — India (Remote) Employment type: Full-time (open to contract-to-hire) Work location: Remote in India Prefer 2–3 hours/day with Pacific Time (PST) About the role We’re building a high-throughput product data ingestion pipeline across hundreds of domains. You’ll also drive the automation around scheduling,...


  • India TripleChoice Inc Full time

    Senior Web Scraping Engineer (Python) — India (Remote) Employment type: Full-time (open to contract-to-hire) Work location: Remote in India Prefer 2–3 hours/day with Pacific Time (PST)About the role We're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll also drive the automation around scheduling,...


  • India Forage AI Full time

    Job DescriptionWe are seeking a Junior Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping...


  • India beBeeCrawler Full time ₹ 15,00,000 - ₹ 28,00,000

    High-Performance Web Crawling Engineer">We are looking for an experienced engineer to build a high-throughput product data ingestion pipeline across hundreds of domains. The successful candidate will own the crawling/extraction layer end-to-end, designing and implementing an HTTP-first crawler with a Playwright fallback for JavaScript-heavy pages.">The role...


  • India beBeeData Full time ₹ 15,00,000 - ₹ 28,00,000

    Web Crawling Engineer OpportunityWe are seeking a Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong programming skills in Python and experience in web scraping frameworks, browser automation tools, and handling...


  • India beBeeWeb Full time ₹ 6,00,000 - ₹ 8,00,000

    Job DescriptionWe are seeking a skilled Web Scraping Engineer to join our team. This is an exciting opportunity to build a high-throughput product data ingestion pipeline across hundreds of domains.As a Senior Web Scraping Engineer, you will own the crawling/extraction layer end-to-end: crawling with a fallback, per-domain learned selectors, and reliable PDF...


  • India beBeeData Full time ₹ 6,00,000 - ₹ 8,00,000

    Senior Web Engineer Job DescriptionWe are seeking an experienced Senior Web Engineer to build a high-throughput product data ingestion pipeline across hundreds of domains. The successful candidate will design, develop, and implement scalable web crawlers using Python.The role involves collaborating with the data team to integrate vendor/public APIs...


  • India TripleChoice Inc Full time

    Senior Web Scraping Engineer (Python) — India (Remote)Employment type: Full-time (open to contract-to-hire)Work location: Remote in IndiaTime overlap: Prefer 2–3 hours/day with Pacific Time (PST)About the roleWe're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll own the crawling/extraction layer...