Senior Data Ingestion Engineer
1 week ago
Job Description:
We are seeking a skilled engineer to build a high-throughput product data ingestion pipeline across hundreds of domains. You will be responsible for the crawling/extraction layer end-to-end, including HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling.
This role spans crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema).
- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Add URL normalization/canonicalization and de-duplication.
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle authentication, pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.
Required Skills and Qualifications:
- Experience in building scalable web crawlers and scrapers.
- Familiarity with Scrapy, aiohttp, and Playwright.
- Knowledge of HTML, CSS, and JavaScript.
- Understanding of web development principles and best practices.
- Strong problem-solving skills and attention to detail.
Benefits:
- Opportunity to work on a high-profile project with a talented team.
- Chance to develop expertise in web scraping and crawling.
- Collaborative and dynamic work environment.
- Ongoing training and professional development opportunities.
Others:
- Participate in code reviews and contribute to open-source projects.
- Engage in knowledge-sharing sessions and workshops.
- Stay up-to-date with industry trends and advancements.
-
Leading Cloud-Based Data Solutions Engineer
2 weeks ago
Vizag, Andhra Pradesh, India beBeeData Full time ₹ 20,00,000 - ₹ 25,00,000Senior Data Engineer with GCP, ETLDesign and implement data pipelines on Google Cloud Platform (GCP) as a Senior Data Engineer.Develop scalable data pipelines using GCP services such as Cloud Storage, Pub/Sub, Dataflow, and Dataproc.Implement efficient batch and streaming data ingestion, building end-to-end pipelines on GCP.Key Responsibilities:Data...
-
Highly Skilled Data Engineering Professional
2 weeks ago
Vizag, Andhra Pradesh, India beBeeDataEngineering Full time ₹ 15,00,000 - ₹ 20,00,000Job Title: Senior Data Engineer We are seeking an experienced Senior Data Engineer to join our team. The ideal candidate will have a strong background in data engineering and experience with designing, creating, managing, and business use of large datasets. The successful candidate will partner with business stakeholders to understand their...
-
Senior Big Data Engineer
1 week ago
Vizag, Andhra Pradesh, India beBeeDataEngineering Full time ₹ 15,86,323 - ₹ 25,15,876Job Title: Senior Azure Data EngineerAbout the RoleWe are seeking an experienced Data Engineering Specialist to lead the development and maintenance of large-scale data pipelines on Azure Data Factory and Databricks.Key Responsibilities:Data Pipeline Development: Design, develop, and maintain complex data pipelines using Azure Data Factory and Databricks.Big...
-
Senior Data Engineer
2 weeks ago
Vizag, Andhra Pradesh, India beBeeData Full time ₹ 20,16,000 - ₹ 25,12,000Unlock Your Data Engineering PotentialWe are seeking a seasoned Data Engineer to join our team, responsible for designing and building large-scale data pipelines using Google Cloud Platform.The ideal candidate will possess strong communication skills and be proficient in developing, testing, and maintaining data acquisition pipelines for structured and...
-
Senior Data Modeler and Engineer
2 weeks ago
Vizag, Andhra Pradesh, India beBeeDataModeler Full time ₹ 2,16,32,000 - ₹ 2,54,17,000About the RoleWe are seeking an experienced Data Modeler and Engineer to join our organization. The ideal candidate will possess a strong understanding of conceptual, logical, and physical data modeling principles, with expertise in requirements gathering, creating data mapping documents, writing functional specifications, and queries for Data Warehouse,...
-
Data Engineering Leader
2 weeks ago
Vizag, Andhra Pradesh, India beBeeSnowflake Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Senior Snowflake and Infrastructure Integration ExpertWe are seeking a seasoned Snowflake developer and infrastructure expert to lead the implementation of full lifecycle projects, including secure platform setup, cloud integration, and high-performance data pipeline development.This role requires someone who thrives at the intersection of data engineering,...
-
Senior Data Engineer
2 weeks ago
Vizag, Andhra Pradesh, India beBeeDataEngineer Full time ₹ 19,10,000**Senior Data Engineer Role**We are seeking a senior data engineer to join our team and contribute to the development of our data infrastructure.The ideal candidate will have a strong background in data engineering, with experience in designing, building, and maintaining large-scale data systems.The successful candidate will be responsible for:Designing and...
-
Cloud Data Engineering Specialist
2 weeks ago
Vizag, Andhra Pradesh, India beBeeDataEngineer Full time ₹ 1,00,00,000 - ₹ 1,20,00,000Cloud Data Engineer SpecialistJob Description:We are seeking a highly skilled Cloud Data Engineer to design, develop and maintain scalable data pipelines and ETL/ELT solutions using AWS cloud services. The ideal candidate will have expertise in ETL/ELT development using PySpark, Python, or SQL, and experience with data modeling, warehousing and performance...
-
Senior Manager, Data Engineering Testing Lead
2 weeks ago
Vizag, Andhra Pradesh, India beBeeDataEngineering Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Lead Comprehensive Testing for Data Engineering and Product DeliveryWe are looking for an experienced Senior Manager to lead comprehensive testing for our data engineering and product delivery initiatives. The ideal candidate will have a strong background in software testing, with a focus on data engineering systems, ETL pipelines, and analytics...
-
Senior Data Migration Specialist
1 week ago
Vizag, Andhra Pradesh, India beBeeDataMigration Full time ₹ 80,00,000 - ₹ 1,20,00,000Job DescriptionWe are seeking an experienced data migration specialist to join our team. The successful candidate will have a proven track record of successfully migrating large datasets from legacy systems to cloud-hosted solutions.Key RequirementsAt least 3-6 years of experience in data engineering, with a strong focus on ETL development and data...