Data Platform Engineer
3 days ago
Job Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools, platforms, and pipelines tailored for processing large-scale, multilingual, multimodal datasets critical for foundational AI models.In this role, you will build scalable data pipelines to ingest, transform, and prepare data from diverse sources—text, speech, images, and video—making it ready for Generative AI model training. Your work will involve developing and managing the underlying platform while addressing challenges like governance, security, observability, lineage, and scalability. The outcomes of your work will include efficient tools for data processing, a reliable data platform, and high-quality datasets tailored to the evolving needs of large-scale AI and LLM training.Collaborating closely with researchers and ML engineers, you will play a pivotal role in enabling BharatGen to deliver state-of-the-art AI models, contributing to the advancement of India’s AI ecosystem through innovative data engineering solutions.Key Responsibilities:Design and Build Scalable Platforms: Develop distributed infrastructure for ingesting, processing, and transforming diverse datasets (text, speech, images, video) at terabyte to petabyte scale.Develop Robust Data Pipelines: Create reliable, scalable pipelines to prepare datasets for Generative AI and LLM training.Implement Governance and Observability: Build frameworks for data lineage, monitoring, and access control to ensure data quality and operational reliability.Optimize Performance and Cost: Enhance platform performance and resource utilization using cost-effective strategies, including GPU-accelerated preprocessing.Collaborate and Innovate: Work closely with researchers and ML engineers to adapt platforms and data pipelines to evolving LLM requirements, addressing various data challenges.Drive Innovation: Stay updated on emerging tools, frameworks, and best practices to implement cutting-edge solutions for large-scale dataset creation.Minimum Qualifications and Experience:Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field with 3+ years of industry experience.Required Skills:Proficiency in distributed systems and frameworks (e.g., Kafka, Ray, PySpark) for scalable data workflows.Exposure to end-to-end data lifecycle management, including DataOps.Strong programming skills in Python, Scala, or Go, with a focus on high-performance pipeline development.Experience with building and optimizing data pipelines, including ETL processes, data modeling, and integration into scalable workflows.Expertise in data scraping, crawling frameworks, and modern dataset development techniques such as synthetic data generation techniques.Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Docker, Kubernetes).Deep understanding of data platform design, including data architecture, metadata tracking, data lineage, observability, monitoring, and scalability best practices.Familiarity with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation), CI/CD pipelines, relational/NoSQL databases, and GPU-accelerated workflows.Familiarity with visualization and monitoring tools for lifecycle management and pipeline performance tracking.Expertise in managing unstructured data (text, speech, or multimodal datasets) for high-performance use cases, ideally in the context of LLM/AI datasets.Understanding of challenges in scalable data engineering, including ingestion, transformation, and storage optimization for large-scale accelerated workflows.
-
Data Platform Engineer
2 days ago
Delhi, India BharatGen Full timeJob Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools,...
-
Data Platform Engineer
1 day ago
Delhi, India ARA Resources Pvt. Ltd. Full timeData Platform EngineerProject Role Description : Assists with the data platform blueprint and design, encompassing the relevant data platform components. Collaborates with the Integration Architects and Data Architects to ensure cohesive integration between systems and data models.Must have skills : Databricks Unified Data Analytics PlatformAI Experience...
-
Data Platform Engineer
2 days ago
New Delhi, India BharatGen Full timeJob Summary:BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We're seeking a skilled Data Platform Engineer to build scalable tools,...
-
Platform Data Engineer
4 weeks ago
New Delhi, India Whatjobs IN C2 Full timeKey Responsibilities - Administer and optimize Azure Databricks, including clusters, jobs, and user access. - Manage Snowflake environments (warehouses, roles/permissions, performance tuning, cost optimization). - Design, implement, and monitor Azure Data Factory (ADF) pipelines for batch and streaming workloads. - Enable data movement and replication across...
-
IT Data Platform Engineer
4 weeks ago
New Delhi, India Palo Alto Networks Full timeOur MissionAt Palo Alto Networks® everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for...
-
Senior Data Governance Analyst
2 weeks ago
Delhi, India Tide Platform Full timeABOUT TIDE At Tide we are building a business management platform designed to save small businesses time and money We provide our members with business accounts and related banking services but also a comprehensive set of connected administrative solutions from invoicing to accounting Launched in 2017 Tide is now used by over 1 million small businesses...
-
Platform Engineer
3 weeks ago
Delhi, India NTT DATA, Inc. Full timeJob Description: -Experience in EDR- CrowdStrike -Experience in any one Ng (NextGen) SIEM tools (Crowdstrike, Qradar, Arcsight, Splunk,etc) - Hands-on Experience in Security Automation tools- SOAR Platform -Experience in Vulnerability Management Solutions- Qualys (Intermediate to Proficient) -Experience in IT & Integration Security Tools - Cribl...
-
Platform Engineer
3 weeks ago
Delhi, India NTT DATA, Inc. Full timeJob Description:-Experience in EDR- CrowdStrike-Experience in any one Ng (NextGen) SIEM tools (Crowdstrike, Qradar, Arcsight, Splunk,etc)- Hands-on Experience in Security Automation tools- SOAR Platform-Experience in Vulnerability Management Solutions- Qualys (Intermediate to Proficient)-Experience in IT & Integration Security Tools- Cribl (Intermediate to...
-
Platform engineer
3 weeks ago
Delhi, India NTT DATA, Inc. Full timeJob Description:-Experience in EDR- Crowd Strike-Experience in any one Ng (Next Gen) SIEM tools (Crowdstrike, Qradar, Arcsight, Splunk,etc)- Hands-on Experience in Security Automation tools- SOAR Platform-Experience in Vulnerability Management Solutions- Qualys (Intermediate to Proficient)-Experience in IT & Integration Security Tools - Cribl (Intermediate...
-
Data Analytics Lead
2 days ago
delhi, India NTT DATA Full timeBuild, manage, and foster a high-functioning team of data engineers and Data analysts. Collaborate with business and technical teams to capture and prioritize platform ingestion requirements. Experience of working with manufacturing industry in building a centralized data platform for self service reporting. Lead the data analytics team members, providing...