
High-Tech Web Crawler Specialist
1 week ago
We're looking for a seasoned web scraping engineer to join our team. Your primary responsibility will be designing and implementing a high-throughput product data ingestion pipeline that can handle hundreds of domains.
">As a web scraping engineer, you'll be responsible for the crawling/extraction layer end-to-end, including HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).
">- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
-
High-Performance Web Crawler Developer
6 days ago
India beBeeSpecialist Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Senior Web Scraping EngineerWe are seeking an experienced Web Scraping Specialist to join our team. The successful candidate will own the crawling/extraction layer end-to-end, designing and implementing a scalable solution that meets our needs.The ideal candidate will have strong expertise in Scrapy or aiohttp and Playwright, with practical experience in...
-
Master Web Data Extractor
2 weeks ago
India beBeeData Full time ₹ 7,50,000 - ₹ 15,00,000Web Crawling Engineer PositionWe are seeking a skilled Web Crawling Engineer who will be responsible for designing and implementing web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling...
-
Senior Python Web Engineer
4 days ago
India TripleChoice Inc Full timeSenior Web Scraping Engineer (Python) — India (Remote) Employment type: Full-time (open to contract-to-hire) Work location: Remote in India Prefer 2–3 hours/day with Pacific Time (PST) About the role We’re building a high-throughput product data ingestion pipeline across hundreds of domains. You’ll also drive the automation around scheduling,...
-
Senior Python Web Engineer
5 days ago
India TripleChoice Inc Full timeSenior Web Scraping Engineer (Python) — India (Remote) Employment type: Full-time (open to contract-to-hire) Work location: Remote in India Prefer 2–3 hours/day with Pacific Time (PST)About the role We're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll also drive the automation around scheduling,...
-
Junior Web Crawling Engineer
2 weeks ago
India Forage AI Full timeJob DescriptionWe are seeking a Junior Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping...
-
Senior Product Data Ingestion Specialist
1 week ago
India beBeeCrawler Full time ₹ 15,00,000 - ₹ 28,00,000High-Performance Web Crawling Engineer">We are looking for an experienced engineer to build a high-throughput product data ingestion pipeline across hundreds of domains. The successful candidate will own the crawling/extraction layer end-to-end, designing and implementing an HTTP-first crawler with a Playwright fallback for JavaScript-heavy pages.">The role...
-
Senior Data Extraction Specialist
2 weeks ago
India beBeeData Full time ₹ 15,00,000 - ₹ 28,00,000Web Crawling Engineer OpportunityWe are seeking a Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong programming skills in Python and experience in web scraping frameworks, browser automation tools, and handling...
-
High-Performance Data Extractor
5 days ago
India beBeeWeb Full time ₹ 6,00,000 - ₹ 8,00,000Job DescriptionWe are seeking a skilled Web Scraping Engineer to join our team. This is an exciting opportunity to build a high-throughput product data ingestion pipeline across hundreds of domains.As a Senior Web Scraping Engineer, you will own the crawling/extraction layer end-to-end: crawling with a fallback, per-domain learned selectors, and reliable PDF...
-
Lead Data Pipeline Developer
5 days ago
India beBeeData Full time ₹ 6,00,000 - ₹ 8,00,000Senior Web Engineer Job DescriptionWe are seeking an experienced Senior Web Engineer to build a high-throughput product data ingestion pipeline across hundreds of domains. The successful candidate will design, develop, and implement scalable web crawlers using Python.The role involves collaborating with the data team to integrate vendor/public APIs...
-
Urgent: Senior Web Scraping Engineer
1 week ago
India TripleChoice Inc Full timeSenior Web Scraping Engineer (Python) — India (Remote)Employment type: Full-time (open to contract-to-hire)Work location: Remote in IndiaTime overlap: Prefer 2–3 hours/day with Pacific Time (PST)About the roleWe're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll own the crawling/extraction layer...