
High-Throughput Web Data Ingestion Specialist
2 days ago
We are building a scalable product data ingestion pipeline across numerous domains.
About the Role:- Crawl and scrape structured specs, images, and PDFs into our schema.
- Design a high-performance crawler with Playwright fallback for JS-heavy pages.
- Implement sitemap diffing and conditional GETs for incremental runs.
- Build a lightweight classifier to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff for concurrent runs.
- Add URL normalization/canonicalization and de-duplication.
- Handle PDF discovery & download with deduplication and size/concurrency caps.
- Apply Playwright browser automation resource budgets.
- Integrate third-party APIs as first-class sources.
- Own automation & orchestration for scheduled runs, idempotent retries, and alerting.
- 4+ years Python experience, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright in production.
- Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- APIs: consuming REST/GraphQL and building small internal services.
- Automation/Orchestration: Airflow/Temporal/Celery for scheduled runs and monitoring.
- PDF handling and file integrity checks.
- Queues, Docker, Linux basics; comfort with logs/metrics.
- Maintaining optimal performance of the data ingestion pipeline.
- Overseeing the integration of new APIs and services.
- Developing robust automation scripts for scheduled tasks.
- Familiarity with Python programming language.
- Experience with web scraping technologies such as Scrapy or aiohttp.
- Knowledge of HTML parsing and manipulation techniques.
- Understanding of HTTP protocols and caching mechanisms.
- Ability to work with APIs and build small internal services.
- Experience with automation tools like Airflow or Celery.
- Basic knowledge of Linux operating system and containerization using Docker.
-
Data Ingestion Strategist
4 days ago
Ghaziabad, Uttar Pradesh, India beBeeIngestion Full time ₹ 20,00,000 - ₹ 25,00,000Lead Data Ingestion EngineerA renowned organization seeks an accomplished Data Ingestion Engineer to spearhead the development and optimization of data ingestion pipelines. This pivotal role demands a deep understanding of data ingestion processes, focusing on integrating diverse data sources into Databricks. The ideal candidate will possess hands-on...
-
Advanced Data Ingestion Specialist
6 days ago
Ghaziabad, Uttar Pradesh, India beBeeDataIngestion Full time ₹ 1,20,00,000 - ₹ 1,50,00,000Job Title: Azure Databricks Engineer with Injection experienceWe are seeking a skilled professional to design, develop, and optimize data ingestion pipelines for integrating multiple sources into Databricks.Key Responsibilities:
-
Building Scalable Data Infrastructure
5 days ago
Ghaziabad, Uttar Pradesh, India beBeeData Full time ₹ 1,50,00,000 - ₹ 2,01,00,000Job Title: Data Systems EngineerJob DescriptionWe're seeking a skilled Data Systems Engineer to design, build and operate scalable data infrastructure. This is a critical role in the development of large-scale language models.The ideal candidate will have expertise in distributed computing, cloud infrastructure, and high-throughput systems. They will work...
-
Web Data Specialist
6 days ago
Ghaziabad, Uttar Pradesh, India beBeeData Full time ₹ 40,00,000 - ₹ 80,00,000Job DescriptionWe are seeking a skilled Web Data Specialist to join our team. The successful candidate will be responsible for designing and implementing web crawlers, extracting valuable insights from the web, and ensuring data quality.The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser...
-
High-Performance Database Architect
2 days ago
Ghaziabad, Uttar Pradesh, India beBeeData Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Database Engineer Job OverviewWe are seeking a skilled professional to manage our database infrastructure, focusing on financial trading and analysis platforms.The ideal candidate will be responsible for designing, developing, and maintaining high-performance databases that handle large volumes of financial market data.This role is critical to ensuring data...
-
Web Data Specialist
7 days ago
Ghaziabad, Uttar Pradesh, India beBeeData Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job Title: Web Data SpecialistAbout the Role:We are seeking a skilled and data-driven individual to lead the end-to-end tracking and analytics setup for our web platform. This role is responsible for designing and implementing a comprehensive measurement framework, ensuring accurate data collection, and delivering actionable insights that drive business...
-
Senior Data Pipeline Developer
1 week ago
Ghaziabad, Uttar Pradesh, India beBeeTechnical Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job DescriptionDesign and build a shared component library for data pipelines, including ingestion, parsing/normalization, extraction, validation, enrichment, and publishing.Define patterns/templates for Apache Beam pipelines and Databricks jobs, standardizing configuration, packaging, versioning, CI/CD, and documentation.Create interfaces so multiple teams...
-
Mastering Data Governance
1 week ago
Ghaziabad, Uttar Pradesh, India beBeeDataGovernance Full time ₹ 20,00,000 - ₹ 25,00,000Data Governance SpecialistYou will be responsible for managing and maintaining data catalogs to ensure adherence to data governance principles.Key Responsibilities:Metadata Ingestion: You will work with the Collibra Data Intelligence Platform to ingest metadata from various sources, including ETL tools, BI platforms, and databases.Data Lineage Stitching: You...
-
Chief Data Solutions Architect
3 days ago
Ghaziabad, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 20,00,000 - ₹ 30,00,000Key Data Engineer Role:We are seeking an expert Data Engineer to lead our data team.The ideal candidate should possess extensive knowledge of AWS services, including Glue, EMR, Lambda, and proficiency in data modelling, Python, SQL, PartiQL and integrating third-party APIs from scratch.Design, develop, and implement comprehensive end-to-end data engineering...
-
Data Pipeline Architect
1 week ago
Ghaziabad, Uttar Pradesh, India beBeeData Full time ₹ 60,00,000 - ₹ 1,20,00,000Job Title: Data Pipeline ArchitectWe are seeking an experienced professional to design, develop, and optimize data ingestion pipelines.About the Role:Design and implement data ingestion pipelines for integrating multiple sources into Databricks.Implement and maintain CI/CD pipelines for data workflows.Deploy and manage containerized applications using...