
High-Performance Data Ingestion Specialist
2 days ago
We're building a high-throughput data ingestion pipeline across hundreds of domains. As a Senior Web Scraping Engineer, you'll own the crawling/extraction layer end-to-end: HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).
Key Responsibilities- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2-4 concurrent/domain; auto-lower on 429/503).
- Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.
- 4+ years Python, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
- Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
This role offers competitive compensation. Please include your expected CTC (INR LPA) and any variable/benefits expectations in your application.
Application Guidelines- Please apply with your resume and links to relevant repos or code samples.
- Include concise notes on a crawler you ran at 100+ sites/day (or similar scale), how you handled rate limits/retries, and your approach to PDF discovery/dedup.
-
High-Performance Data Architect
6 days ago
Bhavnagar, Gujarat, India beBeeDataEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Data Engineering RoleWe're seeking an experienced Data Engineer to join our team responsible for building and scaling data infrastructure. This role requires expertise in large-scale data processing for AI model development.About the Position:You'll work directly with researchers to accelerate experiments, develop new datasets, improve infrastructure...
-
High-Performance Database Specialist
1 week ago
Bhavnagar, Gujarat, India beBeeDatabase Full time ₹ 1,00,000 - ₹ 2,50,000Job Title: High-Performance Database SpecialistAbout the RoleWe are seeking an experienced High-Performance Database Specialist to join our team. The successful candidate will be responsible for ensuring the optimal performance and efficiency of our databases.Key Responsibilities">Database Performance Tuning:">Analyze and resolve database bottlenecks...
-
High-Quality Data Expert
2 weeks ago
Bhavnagar, Gujarat, India beBeeDataQuality Full time ₹ 1,50,00,000 - ₹ 2,00,00,000About the positionWe are seeking an experienced Data Quality Specialist to assist in developing and executing tests using various tools.The successful candidate will have hands-on experience in testing data pipelines, ETL processes, and data ingestion, utilizing tools such as SQL, Tricentis, Python (PySpark), and data quality frameworks.Key skills and...
-
High-Impact Performance Marketer
5 days ago
Bhavnagar, Gujarat, India beBeeSpecialist Full time ₹ 10,00,000 - ₹ 15,00,000Performance Marketing SpecialistWe are seeking a skilled Performance Marketing Specialist to join our team. This high-impact role requires hands-on campaign experience and a strong understanding of e-commerce or real estate marketing funnels.This specialist will be responsible for strategy, execution, and optimization - working closely with our creative...
-
IoT Data Engineering Specialist
6 days ago
Bhavnagar, Gujarat, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000As a specialist in IoT data engineering, you will have the opportunity to design and implement scalable data solutions leveraging AWS services.Your primary responsibility will be to develop and maintain data engineering solutions using Python programming language. You will work with large-scale IoT data ingestion, processing, and storage architectures.You...
-
Chief Data Integration Specialist
2 weeks ago
Bhavnagar, Gujarat, India beBeeData Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Overview:This position is responsible for designing, building, and operating scalable data pipelines to process clinical encounter data and integrate healthcare information. The ideal candidate will collaborate closely with software engineers, machine learning specialists, and clinical partners to drive project success.Key Responsibilities:Design...
-
High Performance Systems Specialist
5 days ago
Bhavnagar, Gujarat, India beBeePerformance Full time ₹ 1,10,00,000 - ₹ 2,01,00,000Job OpportunityWe are seeking a skilled High Performance Systems Specialist to join our team. As a key member of our technology group, you will be responsible for ensuring the smooth operation of our systems by identifying and resolving performance issues.Design and implement monitoring solutions to improve system performance and reliability.Collaborate with...
-
High-Performance Testing Specialist
2 weeks ago
Bhavnagar, Gujarat, India beBeePerformance Full time ₹ 10,00,000 - ₹ 20,00,000High-Performance Testing SpecialistSeeking a seasoned professional with deep knowledge of performance testing principles, methodologies, and best practices.Responsibilities:Develop comprehensive test strategies, workloads, and plans to ensure optimal system performance.Collaborate with cross-functional teams to identify performance bottlenecks, analyze root...
-
Strategic Data Architect
2 weeks ago
Bhavnagar, Gujarat, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000Big Data Specialist RoleWe are seeking a highly skilled professional to fill the position of Big Data Specialist. In this role, you will be responsible for designing and implementing robust data pipelines that handle high-volume financial data.Key Responsibilities:Design, develop, and manage end-to-end data pipelines for stocks, crypto, and other financial...
-
Data Engineering Specialist
1 week ago
Bhavnagar, Gujarat, India beBeeDataEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Data Engineering SpecialistWe're seeking a skilled Data Engineering professional to drive the development of robust data pipelines.The ideal candidate will have expertise in AWS, Python, and experience handling data from various sources such as Hadoop and Terradata.Key Responsibilities:Design and build data pipelines for processing channel activity data.Work...