
Senior Web Scraping Professional
1 week ago
Job Overview
">We are seeking a seasoned web scraping professional to spearhead the development of our data ingestion pipeline.
">- ">
- Design an efficient HTTP-first crawler with a robust Playwright fallback mechanism for handling complex JavaScript-heavy pages.">
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs to optimize resource utilization.">
- Develop a lightweight classifier to determine whether to utilize HTTP or Playwright based on HTML length, JSON-LD presence, and data-product markers.">
- Enforce per-domain throttles and backoff strategies (2–4 concurrent/domain; auto-lower on 429/503) to prevent overloading and improve crawl efficiency.">
- Add URL normalization/canonicalization and de-duplication features to ensure data integrity and reduce redundant requests.">
- Handle PDF discovery & download by performing HEAD requests first to dedupe, implementing size/concurrency caps, and utilizing SHA-256 keys for verification.">
- Apply resource budgets to Playwright browser automation to minimize resource usage (block images/fonts/analytics; kill outliers by size/CPU/time).">
- Integrate third-party APIs as first-class sources: handle authentication (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.">
- Oversee automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting to ensure seamless execution and minimize downtime.">
- Develop per-domain selectors (YAML) with verification on hold-outs; relearn only when health drops to ensure high-quality results.">
- Ship observability metrics: per-site field coverage, error rates, retries, avg page time, and PDF success.">
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service to ensure compliance and respect.">
- Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization to ensure consistency and accuracy.">
Required Skills and Qualifications
">- ">
- 4+ years of Python experience, including 2+ years of building production web crawlers at scale.">
- Strong expertise in Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.">
- Practical knowledge of proxy management, polite anti-bot tactics, and per-domain rate limiting.">
- Hands-on experience with ETag/Last-Modified, retries, backoff, and HTTP caching.">
- Confident in CSS/XPath, schema.org/JSON-LD, and HTML parsing.">
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).">
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.">
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.">
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.">
- Clear, pragmatic communication and strong ownership.">
Benefits
">- ">
- Opportunity to work on a cutting-edge data ingestion pipeline.">
- Collaborative environment with experienced professionals.">
- Professional growth and development opportunities.">
Others
">- ">
- Go or Node.js experience for high-performance crawlers.">
- Cloud: AWS/GCP, S3, ECS/Kubernetes; IaC basics.">
- Workflow engines: Airflow/Temporal/Argo/Celery.">
- Document extraction: Textract/Tika/Camelot/Tabula.">
- Search/analytics: Elasticsearch/OpenSearch; warehousing (Snowflake/Postgres).">
- LLM-assisted selector generation with deterministic verification (optional)."]},
-
Senior Data Developer
2 weeks ago
Thāne, Maharashtra, India beBeeData Full time ₹ 1,00,00,000 - ₹ 2,00,00,000As a senior software developer, you will be part of a dynamic team that collaborates with various departments to create innovative products and features for our platform. Your expertise in Python and SQL/Database skills will play a crucial role in shaping the future of our data-driven solutions.Key Responsibilities:• Collaborate with analysts to understand...
-
Senior Data Extraction Specialist
2 weeks ago
Thāne, Maharashtra, India beBeeData Full time ₹ 8,00,000 - ₹ 12,00,000We are seeking an experienced software engineer to design and optimize data extraction solutions using web scraping and OCR techniques.Key Responsibilities:Develop and maintain scalable Python scripts for web scraping from structured and unstructured sources.Implement OCR solutions to extract text/data from scanned images, PDFs, and other documents.Optimize...
-
Data Science Engineer
1 week ago
Thāne, Maharashtra, India beBeeDataMining Full time ₹ 9,00,000 - ₹ 12,00,000Job Description">We are seeking a skilled Data Mining Analyst with expertise in automating data extraction processes from web platforms.The ideal candidate will be experienced in Python, Selenium, Pandas, SQL, and APIs, with the ability to design and implement efficient and scalable data scraping systems.Key responsibilities include designing, developing,...
-
Senior Web Professional
2 weeks ago
Thāne, Maharashtra, India beBeeDevelopment Full time ₹ 8,00,000 - ₹ 12,00,000Web Development OpportunityWe are seeking a skilled professional to fill the role of Web Developer.This is a unique opportunity for an individual with experience in WordPress, Elementor, and PHP development to join our organization. The ideal candidate should have hands-on expertise in building and managing multiple websites, ensuring smooth hosting...
-
Key Data Insights Specialist
2 weeks ago
Thāne, Maharashtra, India beBeeDataAnalyst Full time ₹ 8,00,000 - ₹ 15,00,000Job DescriptionWe are seeking a skilled professional to fill the role of Data Analyst.The ideal candidate will possess strong Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization.The successful applicant will be responsible for building data pipelines, performing web scraping, and generating...
-
Data Extraction Specialist
2 weeks ago
Thāne, Maharashtra, India beBeeData Full time ₹ 8,00,000 - ₹ 10,00,000Job SummaryWe are seeking a skilled data extraction specialist with expertise in automating processes from web platforms.Key ResponsibilitiesDesign, develop, and maintain robust data extraction solutions to extract structured and unstructured data from various websites and APIs.Use tools like Python, Selenium, BeautifulSoup, Scrapy, and Pandas for data...
-
Web Application Developer
1 week ago
Thāne, Maharashtra, India beBeeDeveloper Full time ₹ 5,00,000 - ₹ 10,00,000Full Stack Web Development OpportunityWe are seeking a talented and driven individual to contribute to the design and development of web applications. The successful candidate will be responsible for working on both front-end and back-end tasks, collaborating with our team to implement features like authentication, APIs, and responsive UI.This is an...
-
Senior Web Development Expert
1 week ago
Thāne, Maharashtra, India beBeeDeveloper Full time ₹ 10,00,000 - ₹ 15,00,000Role OverviewRus Education is a leading provider of overseas education solutions in India, specializing in facilitating Indian students' careers in medicine.The highly subsidized fees, high-quality medical education, and recognition from prestigious organizations make Russia a preferred destination for Indian students.As a seasoned Senior PHP Developer, you...
-
Senior Web Developer
1 week ago
Thāne, Maharashtra, India beBeeBackend Full time ₹ 15,00,000 - ₹ 25,00,000Web Development Professional OpportunityThis role involves the creation and maintenance of scalable web applications using ReactJS and FastAPI.Main Responsibilities:The successful candidate will translate product ideas into prototypes and production-ready solutions.They will collaborate with cross-functional teams to drive project success.Requirements:A...
-
Chief Data Infrastructure Architect
2 weeks ago
Thāne, Maharashtra, India beBeeDataEngineering Full time ₹ 12,00,000 - ₹ 17,00,000Job DescriptionWe are seeking a seasoned Data Engineer with expertise in Python to join our team. This role involves designing, developing, and maintaining large-scale data systems.The ideal candidate will have hands-on experience with data collection, transformation, analysis, and visualization. They will work closely with cross-functional teams to...