High Throughput Data Ingestion Specialist

5 days ago


Bengaluru, Karnataka, India beBeeWebScraping Full time ₹ 1,83,00,000 - ₹ 2,43,20,000
Job Title: Senior Web Scraping Engineer

This is a high-throughput product data ingestion pipeline opportunity across hundreds of domains.

Key Responsibilities:
  • Design and build an HTTP-first crawler with Playwright fallback for JS-heavy pages
  • Implement sitemap diffing and conditional GETs for incremental runs
  • Build a lightweight 'needs JS?' classifier to auto-route HTTP vs Playwright
  • Enforce per-domain throttles/backoff
  • Add URL normalization/canonicalization and de-duplication
  • Handle PDF discovery and download
  • Apply Playwright browser automation resource budgets
  • Integrate third-party APIs as first-class sources
  • Own automation and orchestration for scheduled runs
  • Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success
Requirements:
  • 4+ years of experience in Python, including 2+ years building production web crawlers at scale
  • Strong expertise in Scrapy or aio and Playwright (or Puppeteer) in production
  • Practical proxy management, polite anti-bot tactics, and per-domain rate limiting
  • Hands-on experience with ETag/Last-Modified, retries, backoff, and HTTP caching
  • Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing
  • APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar)
  • Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring


  • Bengaluru, Karnataka, India beBeeDataExpert Full time ₹ 80,00,000 - ₹ 1,50,00,000

    Job DescriptionData Pipeline Management and ValidationBuild and manage ingestion workflows using Spark, Hive, Trino, and Kafka.Validate data ingestion, debug pipeline outputs, and support delta loads.Create and schedule Airflow DAGs for recurring ingestion processes.Automation and Scripting ExpertiseUtilize Python and Bash (or PowerShell) to automate...


  • Bengaluru, Karnataka, India beBeeScraping Full time ₹ 90,00,000 - ₹ 1,25,00,000

    Senior Web Scraping Engineer PositionWe're developing a high-throughout product data ingestion pipeline across hundreds of domains. This position entails owning the crawling/extraction layer end-to-end.


  • Bengaluru, Karnataka, India beBeeDataEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job Title:AWS Data Engineering SpecialistJob Description:We are seeking a highly skilled AWS Data Engineer to design, build and optimize large-scale data pipelines and analytics platforms. The ideal candidate will have strong hands-on expertise in AWS Glue, Athena, Redshift, PySpark, and SQL, with a solid understanding of data engineering and data...


  • Bengaluru, Karnataka, India beBeeDataEngineer Full time ₹ 18,00,000 - ₹ 24,00,000

    Join Our Data Engineering TeamWe are seeking a skilled Associate Software Engineer II to join our data engineering team. As a key member of our team, you will play a crucial role in developing and deploying scalable data ingestion mechanisms for generative AI-powered solutions.This is an exciting opportunity to work with cutting-edge technology and...


  • Bengaluru, Karnataka, India beBeeDataTransformation Full time US$ 80,000 - US$ 1,50,000

    Job Summary:We are seeking a seasoned Data Transformation Specialist to join our organization.About the Role:This position involves designing and implementing scalable, high-performance data transformation pipelines using Snowflake and dbt.The ideal candidate will have experience in building ingestion pipelines from various sources, managing environments,...


  • Bengaluru, Karnataka, India beBeeBackend Full time ₹ 15,00,000 - ₹ 25,00,000

    About our Data Team:We're building a data-driven platform for the finance sector.Role:You'll help create and maintain data ingestion systems that process financial documents, supporting real-time data solutions.Responsibilities:Design scalable data collection pipelinesImplement automated web data extraction solutionsCollaborate with cross-functional...


  • Bengaluru, Karnataka, India beBeeData Full time ₹ 15,00,000 - ₹ 20,00,000

    Senior Big Data EngineerLeverage your expertise in big data processing to drive business growth and innovation.Design and implement scalable big data solutions using Hadoop ecosystem technologies such as HBase, Hive, Kudu, and Spark.Architect HBase schemas and data models to meet evolving business requirements and ensure optimal performance for data storage...


  • Bengaluru, Karnataka, India beBeeData Full time ₹ 9,00,000 - ₹ 12,00,000

    Immerse yourself in the world of data innovation, where you will collaborate with global experts to create cutting-edge solutions for leading companies. Help them become more agile, innovative, and successful by harnessing the power of data.With a strong foundation in engineering and a bachelor's degree or equivalent qualification, you will bring 4-6 years...


  • Bengaluru, Karnataka, India beBeeDataInfrastructure Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    About This RoleWe're looking for an expert in large-scale data processing to join our research team. Data engineering is a crucial component of successful model training and evaluation. You'll work closely with researchers to accelerate experiments, develop new datasets, improve infrastructure efficiency, and enable key insights across our data...


  • Bengaluru, Karnataka, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 3,00,00,000

    Unlock the Power of Data EngineeringThe data engineering team is a critical component of our infrastructure, responsible for designing and implementing scalable systems that support our organization's growth.We focus on building sophisticated data frameworks that can ingest and process large volumes of data from various sources, ensuring seamless reliability...