High Performance Web Crawler Developer

1 week ago


Dindigul, Tamil Nadu, India beBeeDeveloper Full time ₹ 18,00,000 - ₹ 20,00,000
Job Description:

We're building a high-throughput product data ingestion pipeline across hundreds of domains.

Responsibilities:
  • Design an HTTP-first crawler with Playwright fallback for JS-heavy pages and implement sitemap diffing, conditional GETs (ETag/Last-Modified) for incremental runs.
  • Build a lightweight classifier to auto-route HTTP vs Playwright based on HTML length, JSON-LD presence, and data-product markers.
  • Implement per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503) and URL normalization/canonicalization.
  • Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys) and apply Playwright browser automation resource budgets.
  • Integrate third-party APIs as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
  • Own automation & orchestration for scheduled runs, idempotent retries, and alerting.
  • Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
  • Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
Required Skills and Qualifications:
  • 4+ years Python experience including 2+ years building production web crawlers at scale.
  • Strong skills in Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
  • Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
  • Hands-on experience with ETag/Last-Modified, retries, backoff, and HTTP caching.
  • Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
  • APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services.
  • Automation/Orchestration: Airflow/Temporal/Celery for scheduled runs and monitoring.
  • PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
Benefits:
  • Go or Node.js experience for high-performance crawlers.
  • Cloud: AWS/GCP, S3, ECS/Kubernetes; IaC basics.
  • Workflow engines: Airflow/Temporal/Argo/Celery.
  • Document extraction: Textract/Tika/Camelot/Tabula.
  • Search/analytics: Elasticsearch/OpenSearch; warehousing (Snowflake/Postgres).
Others:
  • Ship in small, measurable increments.
  • Track coverage and freshness as north-star metrics.


  • Dindigul, Tamil Nadu, India beBeeData Full time ₹ 10,00,000 - ₹ 25,00,000

    Job Description:We are seeking an experienced Web Data Extraction Specialist to join our team. As a key member of our data acquisition team, you will be responsible for designing and implementing efficient web crawling strategies, extracting valuable insights from the web, and ensuring high-quality data delivery.About this role:Design and develop web...


  • Dindigul, Tamil Nadu, India beBeeJavascript Full time ₹ 1,00,00,000 - ₹ 2,00,00,000

    Senior Javascript DeveloperWe are seeking an experienced and skilled Senior Javascript Developer to fill this key role in our development team.As a crucial member of our team, you will be responsible for designing, developing, and maintaining complex web applications using cutting-edge technologies.Key Responsibilities:Design and develop scalable, efficient,...


  • Dindigul, Tamil Nadu, India beBeeFrontend Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Senior Frontend EngineerJob Description:We are seeking a seasoned Senior Frontend Engineer to spearhead the development of high-performance web applications using modern JavaScript frameworks.The ideal candidate will have strong expertise in designing and developing responsive, scalable frontend architectures that prioritize user experience and...


  • Dindigul, Tamil Nadu, India beBeeDevelopment Full time ₹ 8,00,000 - ₹ 12,00,000

    Web Development Expert WantedWe're on the hunt for a skilled Web Development Expert to join our team. As a key member of our development squad, you will be responsible for crafting high-quality web applications using Wordpress and Webflow.Your primary objective will be to build and maintain websites, execute back-end and front-end development tasks, and...


  • Dindigul, Tamil Nadu, India beBeeWebDeveloper Full time US$ 9,600

    Job TitleWe're seeking a skilled Web Designer and Developer to join our team.Design engaging websites and landing pages that drive results.Develop high-quality websites and landing pages using industry-leading tools such as Go High Level, Unbounce, and WordPress.Deliver polished work that meets or exceeds client expectations.Collaborate with the team to...


  • Dindigul, Tamil Nadu, India beBeeWebDevelopment Full time ₹ 7,50,000 - ₹ 15,00,000

    We are seeking a highly skilled and motivated developer to join our organization in the development of innovative web solutions.The ideal candidate will have experience in back-end web development and programming, as well as skills in front-end development and design. Proficiency in web development technologies and tools is essential for success in this...

  • Backend Web Developer

    2 weeks ago


    Dindigul, Tamil Nadu, India beBeePHPDeveloper Full time ₹ 25,000

    Our organization is seeking a skilled PHP developer to handle ongoing projects and deliver high-quality solutions within tight deadlines.Key Responsibilities:Develop, test, and maintain dynamic web applications using PHP and frameworks like Laravel or CodeIgniter.Collaborate with cross-functional teams (design, quality assurance, and backend) to deliver...


  • Dindigul, Tamil Nadu, India beBeePerformer Full time ₹ 12,49,999 - ₹ 18,69,329

    Job DescriptionAbout us :We balance innovation with an open, friendly culture and the backing of a long-established parent company, known for its ethical reputation. We guide customers from what's now to what's next by unlocking the value of their data and applications to solve their digital challenges, achieving outcomes that benefit both business and...


  • Dindigul, Tamil Nadu, India beBeeSoftware Full time ₹ 10,00,000 - ₹ 20,00,000

    Software Engineer I - Front End Job DescriptionThe primary responsibility is to develop high-quality software solutions as a contributing member of a highly motivated team of engineers.This individual will serve the Software Engineering role on an Agile team, utilizing different web technologies such as HTML5/CSS3, Native JavaScript or AngularJS or Angular...


  • Dindigul, Tamil Nadu, India beBeeDevelopment Full time ₹ 36,00,000 - ₹ 54,00,000

    Senior Frontend EngineerThis is a full-time, on-site role located in Gurugram for a Senior Frontend Engineer. The engineer will be responsible for designing, developing and maintaining both front-end and back-end functionalities of company websites and applications.Daily tasks include coding, testing, debugging and ensuring the seamless integration of...