
Data Scrapper
2 weeks ago
**What You'll Do**:
As a **Data Scraper**, you will lead scale web scraping projects targeting complex and protected sources like the **top B2B networking and social media platforms**. You'll architect, build, and manage intelligent scraping systems that power our core data products and analytics engines.
**Key Responsibilities**:
- Design and build robust scraping pipelines using **Python (Scrapy/Selenium/Playwright)** to extract large volumes of structured and unstructured data.
- Implement advanced anti-bot evasion techniques including proxy rotation, CAPTCHA solving, headless browser scripting, and stealth mechanisms.
- Use **Python (Scrapy/Selenium/Playwright)** to build scalable bots that can mimic human behavior while navigating dynamic content.
- Extract data from JavaScript-rendered sites and GraphQL endpoints using **Python (Scrapy/Selenium/Playwright)** and browser developer tools.
- Build reusable modules for crawling, parsing, transforming, and storing data for use in AI/ML pipelines.
- Collaborate with Data Scientists and Product teams to ensure data coverage, quality, and compliance.
- Optimize end-to-end scraping pipelines for performance, resiliency, and scalability using **Python (Scrapy/Selenium/Playwright)**.
- Maintain awareness of legal and ethical scraping practices, ensuring compliance with applicable laws and platform terms.
**What You Bring: Technical Expertise**:
- Expert in **Python (Scrapy/Selenium/Playwright)** with 4+ years of production experience.
- Deep knowledge of web scraping fundamentals, including XPath, CSS selectors, session/cookie handling, and JS execution.
- Experience building stealth scrapers for **top B2B networking and social platforms** like LinkedIn, Crunchbase, and X (Twitter).
- Familiarity with CAPTCHA solvers (2Captcha, AntiCaptcha), user-agent rotation, and residential proxy usage.
- Comfortable working with JSON, XML, CSV, HTML data formats.
- Understanding of cloud storage, queues, and scheduling (e.g., AWS Lambda, S3, EC2, GCP, Celery).
- Strong debugging, log analysis, and network inspection using tools like DevTools and mitmproxy.
**Bonus Skills**:
- Experience with NLP tools or Named Entity Recognition for document/data parsing.
- Knowledge of RPA tools (UiPath, Automation Anywhere) is a plus.
- Familiarity with Retrieval-Augmented Generation (RAG) and LLM pipelines is a strong advantage.
**Why Join Research Connect?**
- Work on cutting-edge AI + Data Extraction products.
- Build scrapers that power intelligent decision-making across industries.
- Collaborate with a top-tier team using modern tools and best practices.
If you’re passionate about scraping at scale, obsessed with uncovering hidden data, and love building with **Python (Scrapy/Selenium/Playwright)**—we’d love to hear from you.
**Job Types**: Full-time, Permanent
Pay: ₹500,000.00 per year
**Benefits**:
- Paid sick time
Schedule:
- Day shift
**Experience**:
- Data Scrapping: 2 years (required)
- Stealth scrapers for B2B networking and social platforms: 2 years (required)
- CAPTCHA solvers (2Captcha, AntiCaptcha), user-agent rotation: 2 years (required)
Work Location: In person