High-Performance Data Ingestion Specialist

2 days ago


Bhavnagar, Gujarat, India beBeeWebScraping Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

We're building a high-throughput data ingestion pipeline across hundreds of domains. As a Senior Web Scraping Engineer, you'll own the crawling/extraction layer end-to-end: HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).

Key Responsibilities
  • Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
  • Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
  • Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
  • Enforce per-domain throttles/backoff (2-4 concurrent/domain; auto-lower on 429/503).
  • Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
  • Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
  • Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
  • Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
  • Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
  • Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
  • Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
  • Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
  • Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.
Must-Have Qualifications
  • 4+ years Python, including 2+ years building production web crawlers at scale.
  • Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
  • Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
  • Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
  • Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
  • APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
  • Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
  • PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
  • Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
  • Clear, pragmatic communication and strong ownership.
Benefits

This role offers competitive compensation. Please include your expected CTC (INR LPA) and any variable/benefits expectations in your application.

Application Guidelines
  1. Please apply with your resume and links to relevant repos or code samples.
  2. Include concise notes on a crawler you ran at 100+ sites/day (or similar scale), how you handled rate limits/retries, and your approach to PDF discovery/dedup.


  • Bhavnagar, Gujarat, India beBeeDataEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Data Engineering RoleWe're seeking an experienced Data Engineer to join our team responsible for building and scaling data infrastructure. This role requires expertise in large-scale data processing for AI model development.About the Position:You'll work directly with researchers to accelerate experiments, develop new datasets, improve infrastructure...


  • Bhavnagar, Gujarat, India beBeeDatabase Full time ₹ 1,00,000 - ₹ 2,50,000

    Job Title: High-Performance Database SpecialistAbout the RoleWe are seeking an experienced High-Performance Database Specialist to join our team. The successful candidate will be responsible for ensuring the optimal performance and efficiency of our databases.Key Responsibilities">Database Performance Tuning:">Analyze and resolve database bottlenecks...


  • Bhavnagar, Gujarat, India beBeeDataQuality Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    About the positionWe are seeking an experienced Data Quality Specialist to assist in developing and executing tests using various tools.The successful candidate will have hands-on experience in testing data pipelines, ETL processes, and data ingestion, utilizing tools such as SQL, Tricentis, Python (PySpark), and data quality frameworks.Key skills and...


  • Bhavnagar, Gujarat, India beBeeSpecialist Full time ₹ 10,00,000 - ₹ 15,00,000

    Performance Marketing SpecialistWe are seeking a skilled Performance Marketing Specialist to join our team. This high-impact role requires hands-on campaign experience and a strong understanding of e-commerce or real estate marketing funnels.This specialist will be responsible for strategy, execution, and optimization - working closely with our creative...


  • Bhavnagar, Gujarat, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    As a specialist in IoT data engineering, you will have the opportunity to design and implement scalable data solutions leveraging AWS services.Your primary responsibility will be to develop and maintain data engineering solutions using Python programming language. You will work with large-scale IoT data ingestion, processing, and storage architectures.You...


  • Bhavnagar, Gujarat, India beBeeData Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Job Overview:This position is responsible for designing, building, and operating scalable data pipelines to process clinical encounter data and integrate healthcare information. The ideal candidate will collaborate closely with software engineers, machine learning specialists, and clinical partners to drive project success.Key Responsibilities:Design...


  • Bhavnagar, Gujarat, India beBeePerformance Full time ₹ 1,10,00,000 - ₹ 2,01,00,000

    Job OpportunityWe are seeking a skilled High Performance Systems Specialist to join our team. As a key member of our technology group, you will be responsible for ensuring the smooth operation of our systems by identifying and resolving performance issues.Design and implement monitoring solutions to improve system performance and reliability.Collaborate with...


  • Bhavnagar, Gujarat, India beBeePerformance Full time ₹ 10,00,000 - ₹ 20,00,000

    High-Performance Testing SpecialistSeeking a seasoned professional with deep knowledge of performance testing principles, methodologies, and best practices.Responsibilities:Develop comprehensive test strategies, workloads, and plans to ensure optimal system performance.Collaborate with cross-functional teams to identify performance bottlenecks, analyze root...


  • Bhavnagar, Gujarat, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000

    Big Data Specialist RoleWe are seeking a highly skilled professional to fill the position of Big Data Specialist. In this role, you will be responsible for designing and implementing robust data pipelines that handle high-volume financial data.Key Responsibilities:Design, develop, and manage end-to-end data pipelines for stocks, crypto, and other financial...


  • Bhavnagar, Gujarat, India beBeeDataEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Data Engineering SpecialistWe're seeking a skilled Data Engineering professional to drive the development of robust data pipelines.The ideal candidate will have expertise in AWS, Python, and experience handling data from various sources such as Hadoop and Terradata.Key Responsibilities:Design and build data pipelines for processing channel activity data.Work...