
Senior Web Scraping Engineer
1 week ago
Employment type: Full-time (open to contract-to-hire)
Work location: Remote in India
Time overlap: Prefer 2–3 hours/day with Pacific Time (PST)
About the roleWe're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll own the crawling/extraction layer end‑to‑end: HTTP-first crawling with a Playwright fallback, per‑domain learned selectors, and reliable PDF handling (datasheets/specs). You'll also drive the automation around scheduling, retries, and monitoring so runs are hands‑off, and you'll integrate vendor/public APIs (REST/GraphQL) wherever available to complement crawling.
This role spans crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema).What you'll do- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS‑heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight "needs JS?" classifier (HTML length, JSON‑LD presence, data‑product markers) to auto‑route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto‑lower on 429/503).
- Add URL normalization/canonicalization and de‑dup (respect ; hash PDFs).
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA‑256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third‑party APIs (REST/GraphQL) as first‑class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per‑domain selectors (YAML) with verification on hold‑outs; re‑learn only when health drops.
- Ship observability: per‑site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.
- 4+ years Python, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
- Practical proxy management, polite anti‑bot tactics, and per‑domain rate limiting.
- Hands‑on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON‑LD, and HTML parsing.
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
- Go or Node.js experience for high‑performance crawlers.
- Cloud: AWS/GCP, S3, ECS/Kubernetes; IaC basics.
- Workflow engines: Airflow/Temporal/Argo/Celery.
- Document extraction: Textract/Tika/Camelot/Tabula.
- Search/analytics: Elasticsearch/OpenSearch; warehousing (Snowflake/Postgres).
- LLM‑assisted selector generation with deterministic verification (optional).
- Ship in small, measurable increments.
- Track coverage and freshness as north‑star metrics.
- Prefer simple designs that are easy to operate at scale.
Competitive; please include your expected CTC (INR LPA) and any variable/benefits expectations.
ApplicationPlease apply with your resume and links to relevant repos or code samples. Include concise notes on:
- a crawler you ran at 100+ sites/day (or similar scale),
- how you handled rate limits/retries, and
- your approach to PDF discovery/dedup.
-
Senior Web Scraping Specialist
2 weeks ago
Alleppey, Kerala, India beBeeDataScraping Full time ₹ 90,00,000 - ₹ 1,20,00,000Job Title: Data Scraping EngineerWe are seeking an experienced data scraping professional to join our team. The ideal candidate will have a minimum of 4 years of hands-on experience in IT scraping, with at least 2 years leading a team of 5+ developers.This role requires deep technical knowledge in advanced scraping techniques, reverse engineering,...
-
High-Level Web Scraping Specialist
2 weeks ago
Alleppey, Kerala, India beBeeDevelopment Full time ₹ 1,00,00,000 - ₹ 1,50,00,000Web Development Expert WantedWe are seeking a skilled and ambitious developer to lead our long-term project involving advanced web scraping techniques, data automation, and complex system design.The ideal candidate will have experience with high-level programming languages, including Python and JavaScript, as well as proficiency in web development frameworks...
-
Expert Data Scientist
1 week ago
Alleppey, Kerala, India beBeeDataExtraction Full time ₹ 90,00,000 - ₹ 1,20,00,000About the RoleWe are seeking a skilled Data Extraction Specialist with expertise in automating data extraction processes from web platforms.The ideal candidate will be experienced in Python, Selenium, Pandas, SQL, and APIs, with the ability to design and implement efficient and scalable data scraping systems.Main Responsibilities:Design, develop, and...
-
Senior Web Crawler Specialist
1 week ago
Alleppey, Kerala, India beBeeEngineer Full time US$ 12,00,000 - US$ 15,00,000Web Scraping Engineer RoleWe are seeking a skilled Web Scraping Engineer to develop high-throughput web crawling and scraping solutions.This role involves designing and implementing web crawlers using Python, Scrapy, aiohttp, and Playwright.Design an HTTP-first crawler with a Playwright fallback for JavaScript-heavy pages.Implement sitemap diffing and...
-
Senior Data Extraction Engineer
2 weeks ago
Alleppey, Kerala, India beBeeCrawling Full time ₹ 10,00,000 - ₹ 15,00,000Job OpportunityWe are seeking a Web Crawling Specialist who will be responsible for developing and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality.The ideal candidate will have strong programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping...
-
Senior Software Developer Position
2 weeks ago
Alleppey, Kerala, India beBeeSoftwareEngineer Full time ₹ 8,00,000 - ₹ 15,00,000Job OpportunityWe are seeking an experienced software engineer to design and optimize data extraction solutions.The ideal candidate will have expertise in web scraping, OCR, and building scalable scripts for performance and accuracy.Key Responsibilities:Develop and maintain high-quality scripts for web scraping from structured and unstructured...
-
Chief Data Architect
6 days ago
Alleppey, Kerala, India beBeeDataEngineer Full time ₹ 20,00,000 - ₹ 25,00,000Job TitleAs a skilled Data Engineer, you will be responsible for building and maintaining scalable data pipelines, web scraping systems, and integrating them with our data infrastructure.Key Responsibilities:Design and implement efficient data pipelines using Python and popular libraries such as Scrapy and BeautifulSoup.Develop and maintain automated...
-
Ecommerce Data Insights Specialist
1 week ago
Alleppey, Kerala, India beBeeDataMining Full time ₹ 15,00,000 - ₹ 20,00,000As a Data Mining Analyst, you will play a pivotal role in streamlining data extraction processes from web platforms.Key Responsibilities:Design and develop robust web scraping solutions to extract structured and unstructured data from various websites and APIs.Utilize Python, Selenium, BeautifulSoup, Scrapy, and Pandas for data scraping and processing.Build...
-
Senior Business Intelligence Developer
7 days ago
Alleppey, Kerala, India beBeeData Full time ₹ 15,00,000 - ₹ 20,10,000Key Position:Data ArchitectRequired Experience:Minimum 2 YearsWork Environment:Remote Work AvailableAbout the Role:We are seeking an accomplished Data Scientist with exceptional Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization. The ideal candidate should be capable of building efficient data...
-
Chief Data Architect
1 week ago
Alleppey, Kerala, India beBeeDataEngineer Full time ₹ 8,00,000 - ₹ 15,00,000Job Opportunity: Data EngineerWe are seeking a highly skilled Data Engineer to join our team. The ideal candidate will have experience in designing, implementing and maintaining large-scale data systems.Responsibilities:Develop efficient data pipelines using Python, Pandas, NumPy and Scikit-learn.Design and implement web scraping solutions to extract...