
Senior Web Scraping Specialist
2 weeks ago
We are seeking a highly skilled engineer to join our team as a Senior Web Scraping Specialist.
This is a challenging project that requires strong problem-solving skills and attention to detail.
- The ideal candidate will have experience with Python, Scrapy or aiohttp, and Playwright, as well as practical knowledge of proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Design and implement high-throughput product data ingestion pipelines across hundreds of domains.
- Own the crawling/extraction layer end-to-end, including HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- 4+ years Python experience, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
- Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
- Competitive compensation and benefits.
- Opportunities for growth and development.
- Go or Node.js experience for high-performance crawlers.
- Cloud: AWS/GCP, S3, ECS/Kubernetes; IaC basics.
- Workflow engines: Airflow/Temporal/Argo/Celery.
- Document extraction: Textract/Tika/Camelot/Tabula.
- Search/analytics: Elasticsearch/OpenSearch; warehousing (Snowflake/Postgres).
- LLM-assisted selector generation with deterministic verification (optional).
-
Senior Lead Data Scraping Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 12,00,000 - ₹ 20,00,000Senior Lead Data EngineerJob SummaryWe are seeking a highly skilled and experienced Senior Lead Data Engineer to lead our data scraping team. The ideal candidate will have a minimum of 4 years of hands-on experience in IT scraping, with at least 2 years leading a team of 5+ developers.Design and develop scalable data scraping solutions using tools like...
-
Senior Data Scientist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeDataMining Full time ₹ 90,00,000 - ₹ 1,20,00,000Unlock Your Potential as a Data Mining AnalystWe are seeking an accomplished Data Mining Analyst to join our team. As a key member of our organization, you will be responsible for designing, developing, and maintaining robust web scraping solutions to extract structured and unstructured data from various websites and APIs.Key Responsibilities:Design and...
-
Senior Data Science Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeDataScience Full time ₹ 10,00,000 - ₹ 20,00,000Job Summary:We are seeking a highly skilled Senior Data Science Specialist to join our team. As a key member of our organization, you will be responsible for designing and implementing data-driven solutions to drive business growth.About the Role:The ideal candidate will have a strong background in data engineering, with expertise in Python and experience...
-
Data Extraction Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeDataExtraction Full time ₹ 15,00,000 - ₹ 20,00,000This is a rare opportunity to work with our analytics and automation company. We are seeking an experienced Data Extraction Specialist with expertise in automating data extraction processes from web platforms.The ideal candidate will be highly skilled in Python, Selenium, Pandas, SQL, and APIs, with the ability to design and implement efficient and scalable...
-
Senior Data Extraction Developer
2 weeks ago
Aligarh, Uttar Pradesh, India beBeePythonDeveloper Full time ₹ 10,00,000 - ₹ 15,00,000As a seasoned developer, you will play a key role in crafting innovative data extraction solutions that leverage web scraping and OCR expertise. Your primary focus will be on designing and optimizing scalable scripts to extract clean, structured data from various sources.Key Responsibilities:Develop and maintain Python scripts for web scraping from...
-
Senior Data Extraction Specialist
1 week ago
Aligarh, Uttar Pradesh, India beBeeWebCrawler Full time ₹ 1,35,00,000 - ₹ 1,57,50,000High-Throughput Web Crawling EngineerWe're building a high-throughput product data ingestion pipeline across hundreds of domains. This role spans crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema).Key Responsibilities:Design an HTTP-first crawler with Playwright fallback...
-
Web Application Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeSpecialist Full time ₹ 80,00,000 - ₹ 1,50,00,000Web Application SpecialistWe are seeking a highly skilled individual to develop and maintain web applications using the latest technologies and frameworks.
-
Data Marketing Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeMarketing Full time ₹ 8,00,000 - ₹ 12,00,000Job OverviewA senior executive role is available to lead our marketing and data efforts. The ideal candidate will have a strong background in email marketing, mass mailing, and data extraction.This is an excellent opportunity for individuals who want to make a significant impact in the industry by developing and executing effective marketing...
-
Senior Web Content Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeContent Full time ₹ 15,00,000 - ₹ 20,00,000Job DescriptionWe are seeking a highly skilled Content Author to manage and maintain content within Adobe Experience Manager (AEM). As a senior author, you will be responsible for ensuring the quality, accuracy, and performance of web pages.This role requires strong expertise in AEM authoring, taxonomy, and Digital Asset Management (DAM) management. The...
-
Data Insights Specialist
2 weeks ago
Aligarh, Uttar Pradesh, India beBeeData Full time ₹ 10,00,000 - ₹ 15,00,000Job Overview:We are seeking a skilled professional to fill the role of Data Analyst.The ideal candidate will possess strong Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization.Key Responsibilities:Develop, test, and maintain efficient Python scripts for data collection and transformation.Design and...