Data Pipeline Engineer

2 days ago


Multiple Locations, India Forage AI Full time ₹ 6,00,000 - ₹ 18,00,000 per year

Description : Data Pipeline Engineer Web Services, WebCrawling, ETL, NLP(spaCy/LLM), AWS. Experience Level : 5-7 years of relevant experience in data engineering.

About Forage AI :

Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly accurate firmographic and enterprise insights across numerous domains. Trusted by global clients in finance, real estate, and healthcare, Forage AI enables businesses to automate workflows, reduce manual rework, and access high-quality data at scale.

About the Role :

We are seeking a Data Pipeline Engineer to develop, optimize, and maintain production-grade data pipelines focused on web data extraction and ETL workflows. This is a hands-on role requiring strong experience with Python (as the primary programming language), spaCy, LLMs, webcrawling, and cloud deployment in containerized environments.

Youll have opportunities to propose, experiment with, and implement GenAI-driven approaches, innovative automations, and new strategies as part of our product and pipeline evolution. Candidates should have 5-8 years of relevant experience in data engineering, software engineering, or related fields.

Key Responsibilities :

- Design, build, and manage scalable pipelines for ingesting, processing, and storing web and API data.

- Develop robust web crawlers and scrapers in Python (Scrapy, lxml, Playwright) for structured and unstructured data.

- Create and monitor ETL workflows for data cleansing, transformation, and loading into PostgreSQL and MongoDB.

- Apply spaCy for NLP tasks and integrate/fine-tune modern LLMs for analytics.

- DriveGenAI-based innovation and automation in core data workflows.

- Develop and deploy secure REST APIs and web services for data access and interoperability.

- Integrate RabbitMQ,Kafka, SQS(for distributed queueing), and Redis (for caching) into data workflows; also proficient with distributed queue tools such as Celery, TaskIQ.

- Containerize and deploy solutions using Docker on AWS(EC2, ECS, Lambda).

- Collaborate with data teams, maintain pipeline documentation, and enforce data quality standards.

- Maintain and enhance legacy in-house applications as required.

Technical Skills & Requirements :

- Primary programming language is Python; must have experience writing independent Python packages.

- Experience with multithreading and asynchronous programming in Python.

- Advanced Python skills, including web crawling (Scrapy, lxml, Playwright) and strong SQL/data handling abilities.

- Experience with PostgreSQL (SQL) and MongoDB (NoSQL).

- Proficient with workflow orchestration tools such as Airflow.

- Hands-on experience with RabbitMQ, Kafka, SQS(for queueing/distributed processing), and Redis (for caching).

- Practical experience with spaCy for NLP and integration of at least one LLM platform (OpenAI, HuggingFace, etc.).

- Experience with GenAI/LLMs, prompt engineering, or integrating GenAI features into data products.

- Proficiency with Docker and AWS services (EC2, ECS, Lambda).

- Experienced in developing secure, scalable REST APIs using FastAPI and/or Flask.

- Familiarity with third-party APIs integration, including authentication, data handling, and rate limiting.

- Proficient in using Git for version control and collaboration.

- Strong analytical, problem-solving, and documentation skills.

- Bachelors or Masters degree in Computer Science or related field.

What We Offer :

- High ownership and autonomy in shaping technical solutions and system architecture.

- Opportunities to learn modern technologies and propose technical initiatives including GenAI-based approaches.

- Collaborative, supportive, and growth-oriented engineering culture.

- Exposure to a broad set of business and technical problems.

- Structured onboarding and domain training.

- Work-from-Home Infrastructure.

Requirements :

- Business-grade computer (modern processor i7, i9 , 16 GB RAM) with no performance obstacles.

- Reliable high-speed internet for video calls and remote work.

- Quality headphones & camera for clear audio and video.

- Stable power supply and backup options in case of outages.


  • Data Engineer

    2 days ago


    Multiple Locations, India iTheme Consulting Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Position Overview We are seeking a skilled and proactive Data Engineer to join our growing Data & Analytics team. This role is critical to our enterprise-wide data transformation journey, enabling scalable, secure, and high-quality data pipelines that power analytics, governance, and decision-making across the organization. The Data Engineer will work...

  • Data Engineer

    3 days ago


    Multiple Locations, India Dash Hire Full time ₹ 8,00,000 - ₹ 25,00,000 per year

    Description : The Data Engineer will be responsible for designing, developing, and maintaining the data infrastructure for a healthcare organisation. The ideal candidate will have experience in working with healthcare data, including EHR, HIMS, PACS, and RIS. They will also have experience with SQL, Elasticsearch, and data integration tools like Talend....

  • Data Engineer

    1 week ago


    Multiple Locations, India Stafforri Services Full time ₹ 7,00,000 - ₹ 18,00,000 per year

    Key Responsibilities : - Design, build, and manage ETL pipelines using Azure Data Factory (ADF). - Integrate and transform large datasets from multiple sources into structured formats. - Develop and publish interactive Power BI dashboards and reports for business users. - Collaborate with business stakeholders to understand reporting needs and deliver...


  • Multiple Locations, India Hunarstreet Technologies Pvt. Ltd. Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Shift : 2:00 PM - 11:00 PM ISTExperience : 6 years of hands-on Data Engineering experienceAbout the Role : We are looking for experienced Data Engineers who can design, build, and optimize large-scale data pipelines. This role is for individual contributors who love coding, problem-solving, and working with cutting-edge big data technologies.Key...


  • Multiple Locations, India Digihelic Solutions Private Limited Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Description : We are seeking a highly experienced and technically adept Senior Data Engineer to join our TAVS team in Pune. The ideal candidate will have a minimum of 12 years of relevant experience and a strong background in designing, developing, and optimizing large-scale data pipelines and data warehouse solutions. This role requires deep expertise...


  • Multiple Locations, India Eqaim Technology & Services Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Description : About the Project : We are developing a data-driven platform designed to help companies optimize promotional activities for maximum business impact. The solution collects and validates data, analyzes promotion effectiveness, plans promotional calendars, and integrates seamlessly with existing enterprise systems. By leveraging machine...

  • Senior Data Engineer

    2 weeks ago


    Multiple Locations, India DoctusTech Full time

    Description : Role Overview : We are seeking an experienced Senior Data Engineer with a strong background in data security, privacy, and compliance to join our growing engineering team. This individual will play a critical role in designing, building, and maintaining secure data pipelines, ensuring end-to-end compliance across all data systems. The...

  • Lead Data Engineer

    5 days ago


    Multiple Locations, India CAPITALNUMBERS INFOTECH LIMITED Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Senior Data Engineer We are seeking a highly skilled Senior Data Engineer to strengthen and optimize our data infrastructure, which serves as the backbone of our business operations. The ideal candidate will bring deep technical expertise in data engineering, with hands-on experience across PostgreSQL, AWS (Redshift, S3), Apache Airflow, and dbt, and a...

  • Lead Data Engineer

    3 days ago


    Multiple Locations, India Hunarstreet Technologies Pvt. Ltd. Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Responsibilities : - Own the architecture and roadmap for scalable, secure, and high-quality data pipelines and platforms. - Lead and mentor a team of data engineers while establishing engineering best practices, coding standards, and governance models. - Design and implement high-performance ETL/ELT pipelines using modern Big Data technologies for...

  • Senior Data Engineer

    2 weeks ago


    Multiple Locations, India Apidel Technology Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Key Responsibilities : - Design, build, and maintain data pipelines using Azure Data Factory, Databricks, and Data Lake. - Develop efficient, reusable, and scalable ETL/ELT processes in Python, PySpark, and SQL. - Implement and maintain data models, transformations, and integrations for large-scale data platforms. - Ensure data quality,...