
Senior Web Scraping Engineer
1 week ago
Senior Web Scraping Engineer (Python) — India (Remote)Employment type:Full-time (open to contract-to-hire)Work location:Remote in IndiaTime overlap:Prefer 2–3 hours/day with Pacific Time (PST)About the role We’re building a high-throughput product data ingestion pipeline across hundreds of domains. You’ll own the crawling/extraction layer end‑to‑end: HTTP-first crawling with aPlaywrightfallback, per‑domain learned selectors, and reliable PDF handling (datasheets/specs). You’ll also drive theautomationaround scheduling, retries, and monitoring so runs are hands‑off, and you’ll integrate vendor/publicAPIs (REST/GraphQL)wherever available to complement crawling.This role spanscrawling(discovering & fetching pages via sitemaps/robots) andscraping(extracting structured specs, images, and PDFs into our schema).What you’ll doDesign anHTTP-first crawler(Scrapy or aiohttp) withPlaywright fallbackonly for JS‑heavy pages.Implementsitemap diffingandconditional GETs(ETag/Last-Modified) for incremental runs.Build a lightweight“needs JS?” classifier(HTML length, JSON‑LD presence, data‑product markers) to auto‑route HTTP vs Playwright.Enforceper-domain throttles/backoff(2–4 concurrent/domain; auto‑lower on 429/503).AddURL normalization/canonicalizationand de‑dup (respect; hash PDFs).HandlePDF discovery & download(HEAD first to dedupe; size/concurrency caps; SHA‑256 keys).ApplyPlaywright browser automationresource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).Integrate third‑party APIs(REST/GraphQL) as first‑class sources: handle auth (API keys/OAuth2), pagination, and rate limits;unify API + crawl outputs .Ownautomation & orchestrationfor scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.Createper‑domain selectors(YAML) with verification on hold‑outs; re‑learn only when health drops.Shipobservability : per‑site field coverage, error rates, retries, avg page time, and PDF success.Maintainallow/deny paths ; adhere to robots.txt and Terms of Service.Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.Must‑have qualifications4+ yearsPython , including 2+ years buildingproduction web crawlersat scale.Strong withScrapy or aio Puppeteer) in production.Practicalproxy management , polite anti‑bot tactics, andper‑domain rate limiting .Hands‑on withETag/Last-Modified , retries, backoff, and HTTP caching.Confident withCSS/XPath ,schema.org/JSON‑LD , and HTML parsing.APIs:consumingREST/GraphQL(auth, pagination, backoff) and building small internal services ( FastAPIor similar).Automation/Orchestration:Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.PDF handling(requests/HEAD, hashing, size limits) and file integrity checks.Queues ( Redis/Kafka ),Docker , Linux basics; comfort with logs/metrics.Clear, pragmatic communication and strong ownership.Nice to haveGoorNode.jsexperience for high‑performance crawlers.Cloud:AWS/GCP ,S3 , ECS/Kubernetes; IaC basics.Workflow engines:Airflow/Temporal/Argo/Celery .Document extraction:Textract/Tika/Camelot/Tabula .Search/analytics:Elasticsearch/OpenSearch ; warehousing ( Snowflake/Postgres ).LLM‑assisted selector generationwith deterministic verification (optional).How we workShip in small, measurable increments.Track coverage and freshness as north‑star metrics.Prefer simple designs that are easy to operate at scale.Compensation Competitive; please include yourexpected CTC (INR LPA)and anyvariable/benefitsexpectations.Application Please apply with your resume and links to relevant repos or code samples. Include concise notes on:a crawler you ran at100+ sites/day(or similar scale),how you handledrate limits/retries , andyour approach toPDF discovery/dedup .
-
Senior Web Scraping Engineer
4 days ago
Delhi, India S2T AI - AI-Powered Investigations Full timeWe are on the lookout for a highly competent, self-motivated Senior Web Scraping Engineer with real-world experience in API Scraping or Mobile Scraping to join our India team.The Role:- Gather and process raw data at scale (including writing scripts, web scraping, calling APIs)- Work in a larger team.- Able to work independently to complete assigned tasksThe...
-
Senior Web Scraping Engineer
3 days ago
Delhi, India S2T AI - AI-Powered Investigations Full timeWe are on the lookout for a highly competent, self-motivated Senior Web Scraping Engineer with real-world experience in API Scraping or Mobile Scraping to join our India team.The Role:- Gather and process raw data at scale (including writing scripts, web scraping, calling APIs)- Work in a larger team.- Able to work independently to complete assigned tasksThe...
-
Data Engineer
1 week ago
Delhi, India Alternative Path Full timeAlternative Path is seeking skilled software developers to collaborate on client projects with an asset management firm. In this role, you will collaborate with individuals across various company departments to shape and innovate new products and features for our platform, enhancing existing ones. You will have a large degree of independence and trust, but...
-
Python Web Scraping
6 days ago
Delhi, India Innefu Labs Pvt. Ltd. Full time**Company Description**: - Information Security & Data Analytics Series A funded company - Working in cutting edge technologies - Using AI for predictive intelligence and Facial Biometrics - Among Top 5 Cyber excellence companies globally (Holger Schulze awards) - Bronze award of the year (Indian Express IT awards), only cyber security company in top 3 -...
-
Python/JavaScript Developer – Web Scraping
2 days ago
Delhi, India AIMLEAP Full timePython/JavaScript Developer – Web Scraping & AI/ML (3 to 6 years)Experience: 3+ YearsLocation: RemoteMode of Engagement: Full-timeNo of Positions: 3Educational Qualifications: Bachelor's degree in Computer Science, Information TechnologyIndustry: IT / Software DevelopmentNotice Period: Immediate Joiners PreferredWhat We Are Looking For:- Strong expertise...
-
Python/JavaScript Developer – Web Scraping
22 hours ago
Delhi, India AIMLEAP Full timePython/JavaScript Developer – Web Scraping & AI/ML (3 to 6 years)Experience: 3+ YearsLocation: RemoteMode of Engagement: Full-timeNo of Positions: 3Educational Qualifications: Bachelor's degree in Computer Science, Information TechnologyIndustry: IT / Software DevelopmentNotice Period: Immediate Joiners PreferredWhat We Are Looking For:- Strong expertise...
-
Python/JavaScript Developer – Web Scraping
3 days ago
Delhi, India AIMLEAP Full timePython/JavaScript Developer – Web Scraping & AI/ML (3 to 6 years) Experience: 3+ Years Location: Remote Mode of Engagement: Full-time No of Positions: 3 Educational Qualifications: Bachelor's degree in Computer Science, Information Technology Industry: IT / Software Development Notice Period: Immediate Joiners Preferred What We Are Looking For: Strong...
-
Web Scaper
4 days ago
Delhi, India Raptor Supplies Limited Full timeGreeting from Raptor Supplies JOB TITLE: Web Scraper KEY RESPONSIBILITY AREAS: - Experienced in Python, with knowledge of Scrapy framework. - Experience running large scale web scrapes. - Design and build web crawlers to scrape data and URLs. - Ability to clean the scraped data to make it ingestible in the database. - Should have knowledge in scraping...
-
Data Engineer
6 days ago
Delhi, India Gala Intelligence Full timeWe're looking for an entrepreneurial, passionate, and driven Data Engineer to join Startup Gala Intelligence backed by Navneet Tech Venture situated in Ahmedabad. As we're building our technology platform from scratch, you'll have the unique opportunity to shape our technology vision, architecture, and engineering culture right from the ground up. You’ll...
-
Data Engineer
3 days ago
Delhi, India Gala Intelligence Full timeWe're looking for an entrepreneurial, passionate, and driven Data Engineer to join Startup Gala Intelligence backed by Navneet Tech Venture situated in Ahmedabad . As we're building our technology platform from scratch, you'll have the unique opportunity to shape our technology vision, architecture, and engineering culture right from the ground up....