
Senior Data Extraction Specialist
2 weeks ago
We are looking for a skilled Web Scraping Engineer to join our team. As a key member of our engineering team, you will be responsible for building and maintaining high-throughput product data ingestion pipelines across hundreds of domains.
Key Responsibilities:
- Design and implement HTTP-first crawlers with Playwright fallbacks for JavaScript-heavy pages.
- Implement sitemap diffing and conditional GETs using ETags and Last-Modified headers for incremental runs.
- Develop a lightweight classifier to determine whether pages require JavaScript or not, based on factors such as HTML length, JSON-LD presence, and data-product markers.
- Enforce per-domain throttles and backoff strategies to ensure efficient crawling.
- Add URL normalization and canonicalization, as well as de-duplication techniques to reduce duplicates.
- Handle PDF discovery and download using HEAD requests to deduplicate, size and concurrency caps, and SHA-256 keys for verification.
- Apply Playwright browser automation resource budgets to optimize performance.
- Integrate third-party APIs as first-class sources, handling authentication, pagination, and rate limits, and unifying API and crawl outputs.
- Own the automation and orchestration of scheduled runs, including retries and alerting.
- Create per-domain selectors using YAML configuration files, verifying their accuracy through hold-out datasets.
- Ship observability metrics, including per-site field coverage, error rates, retries, average page time, and PDF success rates.
- 4+ years of experience in Python development, including 2+ years of experience building production web crawlers at scale.
- Strong expertise in Scrapy, aiohttp/asyncio, and Playwright (or Puppeteer) in production environments.
- Practical knowledge of proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on experience with ETag, Last-Modified headers, retries, backoff, and HTTP caching.
- Confidence in CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- API experience: consuming REST/GraphQL APIs, handling authentication, pagination, backoff, and building small internal services using FastAPI or similar frameworks.
- Automation/orchestration skills: Airflow, Temporal, Celery, or equivalent schedulers/queues for scheduled runs and monitoring.
- PDF handling: requests/HEAD, hashing, size limits, and file integrity checks.
- Queues: Redis/Kafka, Docker, Linux basics, and comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
- Experience with Go or Node.js for high-performance crawlers.
- Cloud experience: AWS/GCP, S3, ECS/Kubernetes, and IaC basics.
- Workflow engine experience: Airflow, Temporal, Argo, Celery, or equivalent schedulers/queues.
- Document extraction experience: Textract, Tika, Camelot, Tabula.
- Search/analytics experience: Elasticsearch/OpenSearch, warehousing: Snowflake/Postgres.
- LLM-assisted selector generation with deterministic verification (optional).
- Ship in small, measurable increments.
- Track coverage and freshness as north-star metrics.
- Prefer simple designs that are easy to operate at scale.
-
Data Integration Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataIntegration Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Data Integration SpecialistJob Summary:We are seeking a highly skilled Data Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, building, and optimizing data pipelines using IBM DataStage.The successful candidate will possess strong SQL skills for data extraction, transformation, and validation, as well...
-
Data Engineering Lead
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 25,00,000Senior Data Engineer with GCP, ETLWe are seeking an experienced Senior Data Engineer to spearhead large-scale data systems on the Google Cloud Platform (GCP). The ideal candidate will have a proven track record of designing, building, and maintaining complex data pipelines using GCS, PubSub, Dataflow, DataProc, Bigquery, Airflow/Composer, Python, and...
-
Senior Data Integrity Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataQuality Full time ₹ 15,00,000 - ₹ 17,50,000Job Description:We are seeking an experienced ETL tester to join our data team. As a Data Quality Assurance Specialist, you will play a critical role in ensuring the integrity, accuracy, and performance of our data pipelines and dashboards.Main Responsibilities:Data Profiling: Analyze source data to understand data structures, distributions, and anomalies to...
-
Senior Clinical Data Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeClinical Full time ₹ 9,00,000 - ₹ 12,00,000Job Title: Senior Clinical Data SpecialistWe are seeking a highly skilled and experienced Senior Clinical Data Specialist to join our team. As a key member of the team, you will play a vital role in analyzing and maintaining clinical data to support business decisions.About the RoleThe ideal candidate will have extensive experience working with clinical...
-
Senior Data Quality Assurance Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataQuality Full time ₹ 15,00,000 - ₹ 20,00,000Data Quality Assurance SpecialistWe are seeking a skilled and detail-oriented professional to join our data engineering team as a Senior Data Quality Assurance Specialist.Key Responsibilities:Design, develop, and execute automated and manual test cases for data validation and transformation processes.Validate data integrity across various sources including...
-
Senior Business Data Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataAnalyst Full time ₹ 18,00,000 - ₹ 24,00,000Job OverviewData Analysts play a pivotal role in deciphering complex data to uncover trends and patterns that inform business strategies.Analyze large datasets to extract valuable insights that drive business growth.Develop predictive models and machine learning algorithms to solve intricate business problems.Collaborate with cross-functional teams to...
-
Senior Azure Data Engineer
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 18,00,000 - ₹ 25,00,000Job Title: Senior Azure Data EngineerWe are seeking a skilled Senior Azure Data Engineer to join our team. As a key member of our data engineering team, you will be responsible for designing, developing, and maintaining large-scale data pipelines using Azure technologies such as Azure Data Factory, Databricks, and PySpark.Your expertise in data modeling,...
-
Data Engineer Position
1 week ago
Morādābād, Uttar Pradesh, India beBeeData Full time ₹ 15,00,000 - ₹ 20,00,000Data Specialist RoleJob Overview: We are seeking an experienced Data Engineer to lead our data engineering team.About the Job:We require a skilled Data Specialist who can develop, test, and maintain efficient Python scripts for data collection and transformation. The successful candidate will also design and implement web scraping solutions to extract...
-
Senior Cybersecurity Investigator
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDigitalForensics Full time ₹ 1,00,00,000 - ₹ 1,50,00,000Job Title: Forensic AnalystJob Summary: As a Digital Forensics Specialist, you will perform detailed analysis on digital devices and storage media to extract and recover critical evidence.Maintain the integrity of evidence and ensure chain of custody throughout the investigation process.Prepare technical reports and provide expert testimony in legal...
-
Senior Data Solutions Specialist
2 weeks ago
Morādābād, Uttar Pradesh, India beBeeDataDeveloper Full time ₹ 18,00,000 - ₹ 24,00,000Job OverviewWe are seeking a highly skilled Data Developer to join our team. The ideal candidate will possess 6+ years of experience in data development and be proficient in writing efficient and optimized SQL queries to extract, transform, and aggregate data for reporting purposes.The Data Developer will play a key role in designing and implementing data...