Data Pipeline Engineer

2 days ago


Multiple Locations, India Forage AI Full time ₹ 6,00,000 - ₹ 18,00,000 per year

Description : Data Pipeline Engineer Web Services, WebCrawling, ETL, NLP(spaCy/LLM), AWS. Experience Level : 5-7 years of relevant experience in data engineering.

About Forage AI :

Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly accurate firmographic and enterprise insights across numerous domains. Trusted by global clients in finance, real estate, and healthcare, Forage AI enables businesses to automate workflows, reduce manual rework, and access high-quality data at scale.

About the Role :

We are seeking a Data Pipeline Engineer to develop, optimize, and maintain production-grade data pipelines focused on web data extraction and ETL workflows. This is a hands-on role requiring strong experience with Python (as the primary programming language), spaCy, LLMs, webcrawling, and cloud deployment in containerized environments.

Youll have opportunities to propose, experiment with, and implement GenAI-driven approaches, innovative automations, and new strategies as part of our product and pipeline evolution. Candidates should have 5-8 years of relevant experience in data engineering, software engineering, or related fields.

Key Responsibilities :

- Design, build, and manage scalable pipelines for ingesting, processing, and storing web and API data.

- Develop robust web crawlers and scrapers in Python (Scrapy, lxml, Playwright) for structured and unstructured data.

- Create and monitor ETL workflows for data cleansing, transformation, and loading into PostgreSQL and MongoDB.

- Apply spaCy for NLP tasks and integrate/fine-tune modern LLMs for analytics.

- DriveGenAI-based innovation and automation in core data workflows.

- Develop and deploy secure REST APIs and web services for data access and interoperability.

- Integrate RabbitMQ,Kafka, SQS(for distributed queueing), and Redis (for caching) into data workflows; also proficient with distributed queue tools such as Celery, TaskIQ.

- Containerize and deploy solutions using Docker on AWS(EC2, ECS, Lambda).

- Collaborate with data teams, maintain pipeline documentation, and enforce data quality standards.

- Maintain and enhance legacy in-house applications as required.

Technical Skills & Requirements :

- Primary programming language is Python; must have experience writing independent Python packages.

- Experience with multithreading and asynchronous programming in Python.

- Advanced Python skills, including web crawling (Scrapy, lxml, Playwright) and strong SQL/data handling abilities.

- Experience with PostgreSQL (SQL) and MongoDB (NoSQL).

- Proficient with workflow orchestration tools such as Airflow.

- Hands-on experience with RabbitMQ, Kafka, SQS(for queueing/distributed processing), and Redis (for caching).

- Practical experience with spaCy for NLP and integration of at least one LLM platform (OpenAI, HuggingFace, etc.).

- Experience with GenAI/LLMs, prompt engineering, or integrating GenAI features into data products.

- Proficiency with Docker and AWS services (EC2, ECS, Lambda).

- Experienced in developing secure, scalable REST APIs using FastAPI and/or Flask.

- Familiarity with third-party APIs integration, including authentication, data handling, and rate limiting.

- Proficient in using Git for version control and collaboration.

- Strong analytical, problem-solving, and documentation skills.

- Bachelors or Masters degree in Computer Science or related field.

What We Offer :

- High ownership and autonomy in shaping technical solutions and system architecture.

- Opportunities to learn modern technologies and propose technical initiatives including GenAI-based approaches.

- Collaborative, supportive, and growth-oriented engineering culture.

- Exposure to a broad set of business and technical problems.

- Structured onboarding and domain training.

- Work-from-Home Infrastructure.

Requirements :

- Business-grade computer (modern processor i7, i9 , 16 GB RAM) with no performance obstacles.

- Reliable high-speed internet for video calls and remote work.

- Quality headphones & camera for clear audio and video.

- Stable power supply and backup options in case of outages.


  • Data Engineer

    2 days ago


    Multiple Locations, India iTheme Consulting Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Position Overview We are seeking a skilled and proactive Data Engineer to join our growing Data & Analytics team. This role is critical to our enterprise-wide data transformation journey, enabling scalable, secure, and high-quality data pipelines that power analytics, governance, and decision-making across the organization. The Data Engineer will work...

  • Data Engineer

    3 days ago


    Multiple Locations, India Dash Hire Full time ₹ 8,00,000 - ₹ 25,00,000 per year

    Description : The Data Engineer will be responsible for designing, developing, and maintaining the data infrastructure for a healthcare organisation. The ideal candidate will have experience in working with healthcare data, including EHR, HIMS, PACS, and RIS. They will also have experience with SQL, Elasticsearch, and data integration tools like Talend....

  • Data Engineer

    1 week ago


    Multiple Locations, India Stafforri Services Full time ₹ 7,00,000 - ₹ 18,00,000 per year

    Key Responsibilities : - Design, build, and manage ETL pipelines using Azure Data Factory (ADF). - Integrate and transform large datasets from multiple sources into structured formats. - Develop and publish interactive Power BI dashboards and reports for business users. - Collaborate with business stakeholders to understand reporting needs and deliver...

  • Lead Data Engineer

    6 days ago


    Multiple Locations, India CAPITALNUMBERS INFOTECH LIMITED Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Senior Data Engineer We are seeking a highly skilled Senior Data Engineer to strengthen and optimize our data infrastructure, which serves as the backbone of our business operations. The ideal candidate will bring deep technical expertise in data engineering, with hands-on experience across PostgreSQL, AWS (Redshift, S3), Apache Airflow, and dbt, and a...

  • Senior Data Engineer

    2 weeks ago


    Multiple Locations, India Apidel Technology Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Key Responsibilities : - Design, build, and maintain data pipelines using Azure Data Factory, Databricks, and Data Lake. - Develop efficient, reusable, and scalable ETL/ELT processes in Python, PySpark, and SQL. - Implement and maintain data models, transformations, and integrations for large-scale data platforms. - Ensure data quality,...


  • Multiple Locations, India NS Global Corporation Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    About the Role : We are seeking a GenAI Data Engineer to design, build, and optimize data pipelines for unstructured and semi-structured content, integrating advanced AI/ML capabilities. This role combines modern ETL expertise with Vector Database & GenAI integration to support intelligent document processing and semantic search applications.Key...

  • Data Engineer

    2 weeks ago


    Multiple Locations, India SDOD TECHNOLOGIES PRIVATE LIMITED Full time ₹ 10,00,000 - ₹ 25,00,000 per year

    Requirements : - Strong proficiency in writing complex, optimized SQL queries (especially for Amazon Redshift). - Experience with Apache Spark (preferably on AWS EMR) for big data processing. - Proven experience using AWS Glue for ETL pipelines (working with RDS, S3 etc. ). - Strong understanding of data ingestion techniques from diverse sources...

  • GCP Data Engineer

    2 weeks ago


    Multiple Locations, India Insight Global Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Key Responsibilities : - Design, develop, and maintain ETL/ELT pipelines using GCP and BigQuery. - Build scalable data models (dimensional & relational) to support BI and analytics. - Develop efficient SQL queries, performance tuning, and optimization strategies. - Implement workflows with dbt/Dataform for transformations and data quality. -...

  • Data Scientist

    3 days ago


    Multiple Locations, India Zorba Consulting Full time ₹ 10,00,000 - ₹ 25,00,000 per year

    Description : About the Role : We are seeking a results-driven Data Scientist to design, build, and deploy predictive models that solve complex business problems, focusing on customer behavior and personalization. This role requires a strong blend of theoretical knowledge in Machine Learning and practical experience in deploying models in a...


  • Multiple Locations, India TESTQ Technologies Limited Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Role Overview : We are seeking a versatile ML Engineer with strong expertise in deploying ML, AI, and GenAI models on the Snowflake platform. The ideal candidate will have hands-on experience with Snowflake Cortex, building and managing ML pipelines, and utilizing tools like MLflow within DevOps and CI/CD environments. Mastery of Python, SQL,...