
Web Data Architect
2 weeks ago
We are seeking a skilled Senior Web Scraping Engineer to join our team. As a key member of our data ingestion pipeline, you will be responsible for designing and implementing a high-throughput product data ingestion system.
This role spans crawling (discovering & fetching pages via sitemaps/robots) and scraping (extracting structured specs, images, and PDFs into our schema). You will own the crawling/extraction layer end-to-end: HTTP-first crawling with a Playwright fallback, per-domain learned selectors, and reliable PDF handling (datasheets/specs).
Key Responsibilities:- Design an HTTP-first crawler (Scrapy or aiohttp) with Playwright fallback only for JS-heavy pages.
- Implement sitemap diffing and conditional GETs (ETag/Last-Modified) for incremental runs.
- Build a lightweight 'needs JS?' classifier (HTML length, JSON-LD presence, data-product markers) to auto-route HTTP vs Playwright.
- Enforce per-domain throttles/backoff (2–4 concurrent/domain; auto-lower on 429/503).
- Add URL normalization/canonicalization and de-dup (respect ; hash PDFs).
- Handle PDF discovery & download (HEAD first to dedupe; size/concurrency caps; SHA-256 keys).
- Apply Playwright browser automation resource budgets (block images/fonts/analytics; kill outliers by size/CPU/time).
- Integrate third-party APIs (REST/GraphQL) as first-class sources: handle auth (API keys/OAuth2), pagination, and rate limits; unify API + crawl outputs.
- Own automation & orchestration for scheduled runs (Airflow/Temporal/Celery or cron), idempotent retries, and alerting.
- Create per-domain selectors (YAML) with verification on hold-outs; re-learn only when health drops.
- Ship observability: per-site field coverage, error rates, retries, avg page time, and PDF success.
- Maintain allow/deny paths; adhere to robots.txt and Terms of Service.
- Containerize workers; provide runbooks/CI; collaborate with data team on schemas/normalization.
- 4+ years Python, including 2+ years building production web crawlers at scale.
- Strong with Scrapy or aiohttp/asyncio and Playwright (or Puppeteer) in production.
- Practical proxy management, polite anti-bot tactics, and per-domain rate limiting.
- Hands-on with ETag/Last-Modified, retries, backoff, and HTTP caching.
- Confident with CSS/XPath, schema.org/JSON-LD, and HTML parsing.
- APIs: consuming REST/GraphQL (auth, pagination, backoff) and building small internal services (FastAPI or similar).
- Automation/Orchestration: Airflow/Temporal/Celery (or equivalent schedulers/queues) for scheduled runs and monitoring.
- PDF handling (requests/HEAD, hashing, size limits) and file integrity checks.
- Queues (Redis/Kafka), Docker, Linux basics; comfort with logs/metrics.
- Clear, pragmatic communication and strong ownership.
- Competitive compensation package.
- Collaborative work environment.
- Go or Node.js experience for high-performance crawlers.
- Cloud: AWS/GCP, S3, ECS/Kubernetes; IaC basics.
- Workflow engines: Airflow/Temporal/Argo/Celery.
- Document extraction: Textract/Tika/Camelot/Tabula.
- Search/analytics: Elasticsearch/OpenSearch; warehousing (Snowflake/Postgres).
- LLM-assisted selector generation with deterministic verification (optional).
-
Senior Data Architect
2 weeks ago
Vellore, Tamil Nadu, India Sunfinity Technology Solutions Pvt Ltd Full timeLocation: MumbaiExperience: 8-12 yrsThe Senior Solutions Architect is a leading member of our Professional Services team. Cloudera Senior Solutions Architects function as technical leads and trusted advisors, you will work on some of the most exciting distributed data business problems at private and public sector organizations. Engage from early stage...
-
Visionary Web Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeWebDeveloper Full time ₹ 15,00,000 - ₹ 20,10,000As a seasoned web developer, you will play a pivotal role in designing and implementing cutting-edge web-based solutions. This exciting opportunity is based in Bengaluru, where you will collaborate with cross-functional teams to create innovative user interfaces and ensure seamless code testing and debugging.
-
Enterprise Data Architect Specialist
2 weeks ago
Vellore, Tamil Nadu, India beBeeData Full time US$ 1,00,000 - US$ 1,20,000Expert Data Solution Architects design and implement end-to-end data solutions that align with business goals and technical strategy. As a seasoned architect, you will be responsible for creating scalable, secure, and aligned data architectures.About the RoleThis is a key position in our organization where you will work closely with cross-functional teams to...
-
Data Integration Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeIntegration Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Title: Data Integration ArchitectJob Description:We are seeking a seasoned professional with extensive experience in designing, developing, and maintaining scalable data integration solutions using Oracle Integration Cloud (OIC). The ideal candidate will have deep expertise in OIC and be able to architect end-to-end integration flows that meet...
-
Data Solutions Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeData Full time ₹ 1,80,00,000 - ₹ 2,50,00,000As a critical member of our global team, you will collaborate with technology innovators and entrepreneurs who have deep experience in clinical research, medical product safety and public health surveillance.We are partnering with regulators, major pharmaceutical companies and academia to develop innovative analytical solutions used to improve drug, vaccine...
-
Lead Personalization Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeePersonalization Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Role OverviewWe are seeking a seasoned Personalization Architect to lead initiatives for optimized experiences across digital channels.Design and implement solutions for personalized experiences on web and mobile platforms.Collaborate with cross-functional teams to integrate real-time decisioning capabilities.Provide strategic guidance on journey...
-
Data Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeArtificialIntelligence Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job Title: Data Architect - AI Systems">We are seeking a highly skilled Senior Machine Learning Engineer to drive the development of our next-generation AI-powered systems. As a key member of our team, you will be responsible for designing and implementing cutting-edge AI solutions that transform our business.">The ideal candidate will have expertise in...
-
Data Systems Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeTechnical Full time ₹ 15,00,000 - ₹ 20,00,000About the Role:We are seeking a forward-thinking Senior Technical Lead to oversee the development and deployment of scalable cloud-based data solutions.Key Responsibilities:Design, develop, and deploy big data processing systems utilizing Java and Apache Spark in a cloud environment.Architect robust data processing pipelines leveraging technologies like...
-
Principal Data Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeArchitect Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job OverviewWe are seeking a Principal Data Architect to support infrastructure, data pipeline development, and deployment of pricing logic for a data-rich e-commerce platform serving the life sciences sector.This role emphasizes usability and interface design for internal tools that enable experimentation, pricing configuration, and real-time...
-
Principal Data Architect
2 weeks ago
Vellore, Tamil Nadu, India beBeeData Full time ₹ 10,00,000 - ₹ 15,00,000Key Data PositionThis is an exciting opportunity to join a dynamic team as a Data Engineer.Support the entire data project lifecycle from acquisition to delivery.Design and implement innovative data product architectures.Oversee projects from conception to completion, ensuring timely delivery.Work hands-on with core data platforms and tools.Translate...