
Web Data Extraction Specialist
2 days ago
Job Title: Senior Web Scraping Developer
">We are building a high-throughput data ingestion pipeline across hundreds of domains. As a Senior Web Scraping Developer, you will own the crawling/extraction layer end-to-end: HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs). You will also drive automation around scheduling, retries, and monitoring so runs are hands-off, and you will integrate vendor/public APIs (REST/GraphQL) wherever available to complement crawling.
">This role spans crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema). Your key responsibilities include:
- Designing an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implementing sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Building a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforcing per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Adding URL normalization/canonicalization and de-dup (respect ; hash PDFs).
- Handling PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Applying Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrating third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Owning automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Creating per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Shipping observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintaining allow/deny paths; adhering to robots.txt and Terms of Service.
- Containerizing workers; providing runbooks/CI; collaborating with data team on schemas/normalization.
Must-have qualifications include:
- 4+ years Python, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
- Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
-
Data Extraction Specialist
5 days ago
Jamnagar, Gujarat, India beBeeDataExtraction Full time ₹ 80,00,000 - ₹ 1,25,00,000Job Title: Data Extraction SpecialistJob Description:We are seeking a skilled Data Extraction Specialist with expertise in automating data extraction processes from web platforms.The ideal candidate will be experienced in Python, Selenium, Pandas, SQL, and APIs, with the ability to design and implement efficient and scalable data scraping systems.
-
Expert Data Extraction Specialist
6 days ago
Jamnagar, Gujarat, India beBeeDeveloper Full time ₹ 15,00,000 - ₹ 20,00,000Key Roles and Responsibilities:We seek an experienced Python developer to design and optimize data extraction solutions.Develop scalable scripts for web scraping from structured and unstructured sources.Implement OCR solutions to extract text/data from scanned images, PDFs, and other documents.Optimize data extraction workflows for performance and...
-
Expert Web Data Miner
2 weeks ago
Jamnagar, Gujarat, India beBeeData Full time ₹ 9,00,000 - ₹ 12,00,000Job Description:Data Mining SpecialistNuvoretail seeks a seasoned data mining professional to design, develop, and implement efficient web scraping solutions using Python, Selenium, Pandas, SQL, and APIs.Responsibilities:Design robust web scraping workflows to extract structured and unstructured data from various websites and APIs.Utilize Python libraries...
-
Web Crawler Specialist
6 days ago
Jamnagar, Gujarat, India beBeeSpecialist Full time ₹ 12,00,000 - ₹ 21,49,500Web Crawler SpecialistWe are seeking a skilled Web Crawler Specialist to join our team. In this role, you will be responsible for developing and refining web crawlers using Python-based tools and frameworks.The ideal candidate will have strong programming skills in Python, experience in web scraping frameworks, and knowledge of browser automation tools such...
-
Web Scraping Expert
2 weeks ago
Jamnagar, Gujarat, India beBeeData Full time ₹ 15,00,000 - ₹ 20,00,000Senior Data Extraction SpecialistWe are seeking a highly skilled and experienced Senior Data Extraction Specialist to join our team. The ideal candidate will have a minimum of 4 years of hands-on experience in IT data extraction, with at least 2 years leading a team of 5+ developers.This role requires deep technical knowledge in advanced data extraction...
-
Expert Data Miner Wanted
3 days ago
Jamnagar, Gujarat, India beBeeDataMining Full time ₹ 9,00,000 - ₹ 12,00,000Job Title:Data Mining SpecialistAbout the Role:We are seeking a skilled data mining specialist to join our team. The ideal candidate will have expertise in automating data extraction processes from web platforms, utilizing tools like Python, Selenium, Pandas, and APIs.Key Responsibilities:Design and Develop Robust Web Scraping SolutionsExtract Structured and...
-
Data Science Specialist
4 days ago
Jamnagar, Gujarat, India beBeeDataScience Full time ₹ 10,00,000 - ₹ 20,00,000Job Title: Data Science SpecialistWe are seeking an experienced Data Science Specialist to join our team. This is a highly visible role that involves designing and implementing large-scale data pipelines, performing web scraping, and generating actionable insights through dashboards and reports.Key Responsibilities:Develop efficient Python scripts for data...
-
Advanced Data Extraction Engineer
2 hours ago
Jamnagar, Gujarat, India beBeeOptimization Full time ₹ 20,00,000 - ₹ 30,00,000Job DescriptionWe are seeking a highly skilled OCR and AI engineer to join our team. The ideal candidate will have expertise in designing, training, and deploying custom OCR models using Azure Document Intelligence and AWS Textract.The successful candidate will be responsible for automating document classification and data extraction from structured and...
-
Web Application Specialist
1 week ago
Jamnagar, Gujarat, India beBeeFullStack Full time US$ 12,00,000 - US$ 18,00,000Web Application SpecialistOur team is seeking an experienced Web Application Specialist with expertise in Node.js and React.js to design, develop, and scale secure, high-performance web applications in the travel domain.The role involves working across front end, back end, and database layers while integrating third-party travel APIs, architecting...
-
Data Specialist
6 days ago
Jamnagar, Gujarat, India beBeeDataEngineering Full time ₹ 15,00,000 - ₹ 25,00,000Seeking a talented Data Specialist to join our team. As a Data Engineer, you will be responsible for developing and maintaining large-scale data systems using the Google Cloud stack.About the RoleWe are looking for an experienced professional with strong expertise in data engineering, including data extraction, analysis, visualization, and pipeline...