Data Platform Engineer

2 days ago


mumbai, India BharatGen Full time

Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We're seeking a skilled Data Platform Engineer to build scalable tools, platforms, and pipelines tailored for processing large-scale, multilingual, multimodal datasets critical for foundational AI models.In this role, you will build scalable data pipelines to ingest, transform, and prepare data from diverse sources—text, speech, images, and video—making it ready for Generative AI model training. Your work will involve developing and managing the underlying platform while addressing challenges like governance, security, observability, lineage, and scalability. The outcomes of your work will include efficient tools for data processing, a reliable data platform, and high-quality datasets tailored to the evolving needs of large-scale AI and LLM training.Collaborating closely with researchers and ML engineers, you will play a pivotal role in enabling BharatGen to deliver state-of-the-art AI models, contributing to the advancement of India's AI ecosystem through innovative data engineering solutions.Key Responsibilities:- Design and Build Scalable Platforms: Develop distributed infrastructure for ingesting, processing, and transforming diverse datasets (text, speech, images, video) at terabyte to petabyte scale.- Develop Robust Data Pipelines: Create reliable, scalable pipelines to prepare datasets for Generative AI and LLM training.- Implement Governance and Observability: Build frameworks for data lineage, monitoring, and access control to ensure data quality and operational reliability.- Optimize Performance and Cost: Enhance platform performance and resource utilization using cost-effective strategies, including GPU-accelerated preprocessing.- Collaborate and Innovate: Work closely with researchers and ML engineers to adapt platforms and data pipelines to evolving LLM requirements, addressing various data challenges.- Drive Innovation: Stay updated on emerging tools, frameworks, and best practices to implement cutting-edge solutions for large-scale dataset creation.Minimum Qualifications and Experience:- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field with 3+ years of industry experience.Required Skills:- Proficiency in distributed systems and frameworks (e.g., Kafka, Ray, PySpark) for scalable data workflows.- Exposure to end-to-end data lifecycle management, including DataOps.- Strong programming skills in Python, Scala, or Go, with a focus on high-performance pipeline development.- Experience with building and optimizing data pipelines, including ETL processes, data modeling, and integration into scalable workflows.- Expertise in data scraping, crawling frameworks, and modern dataset development techniques such as synthetic data generation techniques.- Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Docker, Kubernetes).- Deep understanding of data platform design, including data architecture, metadata tracking, data lineage, observability, monitoring, and scalability best practices.- Familiarity with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation), CI/CD pipelines, relational/NoSQL databases, and GPU-accelerated workflows.- Familiarity with visualization and monitoring tools for lifecycle management and pipeline performance tracking.- Expertise in managing unstructured data (text, speech, or multimodal datasets) for high-performance use cases, ideally in the context of LLM/AI datasets.- Understanding of challenges in scalable data engineering, including ingestion, transformation, and storage optimization for large-scale accelerated workflows.



  • Mumbai, India BharatGen Full time

    Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools,...


  • mumbai, India BharatGen Full time

    Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools,...


  • Mumbai, India BharatGen Full time

    Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools,...


  • Mumbai, India BharatGen Full time

    Job Summary: BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We're seeking a skilled Data Platform Engineer to build scalable tools,...

  • Data Platform Engineer

    23 hours ago


    Mumbai, India BharatGen Full time

    Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We're seeking a skilled Data Platform Engineer to build scalable tools,...


  • Mumbai, Maharashtra, India Growel Softech Pvt. Ltd. Full time ₹ 5,00,000 - ₹ 15,00,000 per year

    :Primary Skill - Platform Engineer, Data Bricks, Azure data platform, infra support Main 3 competence needed in Primary Skills - Platform Engineer, Data Bricks, Azure data platform Platform Engineer, Data Bricks, Azure data platform

  • Data Platform Engineer

    13 hours ago


    Mumbai, Maharashtra, India Xoriant Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Role & responsibilitiesAs a Big Data Platform Engineer you will be responsible for the technical delivery of our Data Platform's core functionality and strategic solutions. This includes the development of reusable tooling/API's, applications, data stores, and software stack to accelerate our relational data warehousing, big data analytics and data...


  • Mumbai, India NTT DATA Full time

    Job Description Make an impact with NTT DATA Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion – it’s a place where you can grow, belong and thrive. Your day at NTT...


  • Mumbai Metropolitan Region, India BharatGen Full time

    Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools,...


  • mumbai, India Russell Investments Full time

    Business Unit:Global TechnologyReporting To:Senior Manager, Application DevelopmentShift:EMEA (1:30 pm - 10:30 pm IST) (India)About Russell Investments, Mumbai: Russell Investments is a leading outsourced financial partner and global investment solutions firm providing a wide range of investment capabilities to institutional investors, financial...