
Highly Skilled Web Data Extractor
2 weeks ago
We're building a high-throughput product data ingestion pipeline across hundreds of domains. You'll be responsible for the crawling/extraction layer end-to-end: HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).
This role encompasses crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema). Key responsibilities include designing an HTTP-first crawler, implementing sitemap diffing and conditional GETs, building a lightweight classifier to auto-route HTTP vs Playwright, enforcing per-domain throttles/backoff, adding URL normalization/canonicalization and de-duplication, handling PDF discovery & download, applying Playwright browser automation resource budgets, integrating third-party APIs, owning automation & orchestration for scheduled runs, creating per-domain selectors, shipping observability, maintaining allow/deny paths, adhering to robots.txt and Terms of Service.
Must-haves include 4+ years of Python experience, strong skills in Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production, practical proxy management, polite anti-bot tactics, and per-domain rate limiting, hands-on experience with ETag/Last-Modified, retries, backoff, and HTTP caching, confidence with CSS/XPath, schema.org/JSON-LD, and HTML parsing, APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services, automation/orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring, PDF handling (requests/HEAD, hashing, size limits) and file integrity checks, queues (Redis/Kafka), Docker, Linux basics, clear, pragmatic communication and strong ownership.
- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- Deliver in small, measurable increments.
- Track coverage and freshness as north-star metrics.
- Prefer simple designs that are easy to operate at scale.
We offer competitive compensation. Please include your expected salary range in INR LPA and any variable/benefits expectations.
-
Highly Skilled Backend Engineer
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeDeveloper Full time ₹ 10,00,000 - ₹ 15,00,000Full Stack DeveloperJob Title: Full Stack DeveloperWe are seeking a highly skilled Full Stack Developer to join our dynamic development team. This full-time role requires a seasoned professional with 6+ years of comprehensive web development experience who can handle both frontend and backend development responsibilities while contributing to our DevOps and...
-
A Highly Skilled Data Analyst
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeDataInsight Full time ₹ 6,00,000 - ₹ 10,00,000Drive Business Growth with Data InsightsJob Description:As a skilled Data Analyst, you will play a pivotal role in generating high-standard service delivery, achieving successful outcomes for clients, capturing data and sharing knowledge across projects, enhancing our culture of innovation and reinforcing our reputation as a preferred service provider.Main...
-
Highly Skilled Data Professional
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 20,00,000 - ₹ 25,00,000Big Data EngineerThe role involves designing, developing, and optimizing large-scale data pipelines and distributed data processing systems.
-
Highly Skilled Workday Integration Expert Wanted
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeIntegration Full time US$ 99,690 - US$ 1,26,355Workday Integration SpecialistAre you a seasoned Workday expert with experience in integrations and reporting? We are seeking a highly skilled professional to join our team as a Workday Integration Specialist. This is an exciting opportunity for someone who wants to leverage their technical expertise to drive business success.Job Description:The Workday...
-
Highly Skilled Back End Developer
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeBackend Full time ₹ 18,00,000 - ₹ 24,00,000Job Opportunity:We are seeking a highly skilled Back-End developer to join our team. The ideal candidate will have hands-on experience in .NET Core, SQL Server and Selenium-based test automation.Develop scalable web applications using .NET Core.Design and optimize relational databases using SQL Server.Implement and maintain automated test cases using...
-
Highly Skilled Financial Data Architect
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeDataEngineer Full time US$ 1,70,000 - US$ 2,02,000Job OpportunityWe are seeking an experienced Data Engineer to join our team.About the Role:This position is a key contributor in building and scaling modern financial data platforms, developing and optimizing data warehouse solutions, and ensuring performance and scalability for finance and accounting workloads.Key Responsibilities:Snowflake Data...
-
Highly Skilled Data Engineer and Analyst
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeDataSpecialist Full time ₹ 18,00,000 - ₹ 26,00,000Data Engineer and Analytics Specialist">The role of a Data Engineer and Analytics Specialist involves designing, building, and maintaining the infrastructure for storing, processing, and analyzing large datasets. This includes creating data pipelines, implementing data modeling best practices, and ensuring data governance.Key Responsibilities:Data...
-
Backend Data Developer Opportunity
2 weeks ago
Allahabad, Uttar Pradesh, India beBeeData Full time ₹ 80,00,000 - ₹ 1,50,00,000Backend Data EngineerWe're seeking a seasoned Backend Data Engineer to join our team. The ideal candidate will possess strong experience in building high-performance data pipelines and developing scalable backend systems.This role involves transforming raw on-chain data into actionable insights by decoding smart contract events and implementing pricing logic...
-
Web Interface Specialist
2 weeks ago
Allahabad, Uttar Pradesh, India beBeefrontend Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Frontend Development ExpertWe are seeking a highly skilled professional to design, develop and maintain user interfaces for web applications.Main Responsibilities:Create responsive web interfaces that meet user needsCollaborate with cross-functional teams for seamless backend integrationOptimize web applications for performance and scalabilityRequired Skills...
-
Highly Skilled Software Architect
1 week ago
Allahabad, Uttar Pradesh, India beBeeSoftware Full time US$ 21,280 - US$ 33,520Job Opportunity:We are seeking a seasoned software engineer with expertise in the MERN stack to join our organization. The ideal candidate will possess a strong background in crafting high-quality, optimized code and be able to develop architectural patterns for large-scale web applications.About the Role:The chosen candidate will work closely with our...