Data Scrapper

2 weeks ago


Malad Mumbai Maharashtra, India Research Connect LLC Full time

**What You'll Do**:
As a **Data Scraper**, you will lead scale web scraping projects targeting complex and protected sources like the **top B2B networking and social media platforms**. You'll architect, build, and manage intelligent scraping systems that power our core data products and analytics engines.

**Key Responsibilities**:

- Design and build robust scraping pipelines using **Python (Scrapy/Selenium/Playwright)** to extract large volumes of structured and unstructured data.
- Implement advanced anti-bot evasion techniques including proxy rotation, CAPTCHA solving, headless browser scripting, and stealth mechanisms.
- Use **Python (Scrapy/Selenium/Playwright)** to build scalable bots that can mimic human behavior while navigating dynamic content.
- Extract data from JavaScript-rendered sites and GraphQL endpoints using **Python (Scrapy/Selenium/Playwright)** and browser developer tools.
- Build reusable modules for crawling, parsing, transforming, and storing data for use in AI/ML pipelines.
- Collaborate with Data Scientists and Product teams to ensure data coverage, quality, and compliance.
- Optimize end-to-end scraping pipelines for performance, resiliency, and scalability using **Python (Scrapy/Selenium/Playwright)**.
- Maintain awareness of legal and ethical scraping practices, ensuring compliance with applicable laws and platform terms.

**What You Bring: Technical Expertise**:

- Expert in **Python (Scrapy/Selenium/Playwright)** with 4+ years of production experience.
- Deep knowledge of web scraping fundamentals, including XPath, CSS selectors, session/cookie handling, and JS execution.
- Experience building stealth scrapers for **top B2B networking and social platforms** like LinkedIn, Crunchbase, and X (Twitter).
- Familiarity with CAPTCHA solvers (2Captcha, AntiCaptcha), user-agent rotation, and residential proxy usage.
- Comfortable working with JSON, XML, CSV, HTML data formats.
- Understanding of cloud storage, queues, and scheduling (e.g., AWS Lambda, S3, EC2, GCP, Celery).
- Strong debugging, log analysis, and network inspection using tools like DevTools and mitmproxy.

**Bonus Skills**:

- Experience with NLP tools or Named Entity Recognition for document/data parsing.
- Knowledge of RPA tools (UiPath, Automation Anywhere) is a plus.
- Familiarity with Retrieval-Augmented Generation (RAG) and LLM pipelines is a strong advantage.

**Why Join Research Connect?**
- Work on cutting-edge AI + Data Extraction products.
- Build scrapers that power intelligent decision-making across industries.
- Collaborate with a top-tier team using modern tools and best practices.

If you’re passionate about scraping at scale, obsessed with uncovering hidden data, and love building with **Python (Scrapy/Selenium/Playwright)**—we’d love to hear from you.

**Job Types**: Full-time, Permanent

Pay: ₹500,000.00 per year

**Benefits**:

- Paid sick time

Schedule:

- Day shift

**Experience**:

- Data Scrapping: 2 years (required)
- Stealth scrapers for B2B networking and social platforms: 2 years (required)
- CAPTCHA solvers (2Captcha, AntiCaptcha), user-agent rotation: 2 years (required)

Work Location: In person