Cloud AI/ML Infrastructure Architect

2 weeks ago


India Restored Cloud Full time
Job Overview

Restored Cloud is seeking a skilled Distributed Systems Engineer to design and optimize distributed infrastructure for large-scale AI/ML model training and inference. As a key member of our team, you will address challenges like minimizing checkpointing delays, enabling seamless fault recovery, and maximizing resource utilization for models exceeding 1B parameters.

Key Responsibilities:
  • Develop and scale distributed systems tailored for high-performance AI/ML workloads, focusing on eliminating delays caused by traditional checkpointing.
  • Design fault-tolerant and high-availability systems that ensure seamless operation and rapid recovery, even during infrastructure failures.
  • Implement advanced data partitioning, synchronization, and parallel computation techniques to handle terabytes of data and optimize memory usage across multi-node setups.
  • Collaborate with ML and infrastructure engineers to design innovative solutions for distributed training and inference of large-scale models.
  • Identify and resolve performance bottlenecks, particularly those arising from storage, memory, or network constraints in AI workflows.
  • Stay at the forefront of emerging distributed computing trends, such as zero-copy memory sharing, efficient in-memory data storage, and distributed model execution, to ensure your solutions remain cutting-edge.
Requirements
  • Bachelor's degree in Computer Science, Distributed Systems, Computer Engineering, or a related field.
  • 5+ years of experience in designing and implementing distributed systems.
  • Proficiency in programming languages such as Python, C++, or Java.
  • Strong understanding of distributed computing principles, including fault tolerance, synchronization, and parallel computation.
  • Experience with distributed training frameworks such as PyTorch Distributed, TensorFlow Distributed, or DeepSpeed.
  • Familiarity with cloud platforms (AWS, GCP, or Azure) and managing multi-node infrastructure.
  • Demonstrated ability to troubleshoot performance bottlenecks in distributed systems.
Preferred Qualifications:
  • Master's or Ph.D. in Computer Science, Distributed Systems, Computer Engineering, or a related field.
  • 7+ years of hands-on experience with large-scale distributed systems for AI/ML workloads.
  • Expertise in advanced distributed systems concepts, such as zero-copy memory sharing, RDMA, and NVMe-based storage.
  • Experience working at Nvidia, AMD, AWS, or a similar distributed systems-focused organization.
  • Proven track record of optimizing distributed systems for AI/ML models with 1B+ parameters.
  • Strong knowledge of network optimization techniques for high-performance computing.


  • Anywhere in India/Multiple Locations Finanshels Full time

    We are seeking a highly experienced AI / ML Infrastructure Architect Lead to design and develop cutting-edge solutions that drive innovation in Finanshels.This is an exceptional opportunity for a skilled technologist to lead the execution of large-scale AI / ML projects, ensuring alignment with business objectives and driving solutions for high-impact...


  • Anywhere in India/Multiple Locations Finanshels Full time

    Job OverviewFinanshels is seeking an experienced Senior AI/ML Architect to lead the development and deployment of innovative AI and machine learning solutions. The ideal candidate will have a strong background in designing and implementing large-scale distributed systems, with expertise in cloud platforms (AWS, GCP, Azure) and serverless architectures.About...


  • India Proximity Works Full time

    We are looking for a highly experienced AI/ML Architect and Technical Lead to join our team at Proximity Works.This role offers a unique opportunity to shape the future of AI/ML infrastructure and drive innovation in cutting-edge tech solutions.As a key technical leader, you will be responsible for designing and developing large-scale AI/ML systems, ensuring...

  • Ai/ml architect

    3 weeks ago


    India Programmers.io Full time

    Job Overview: We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies. Key Responsibilities: • Design...

  • AI/ML Architect

    4 weeks ago


    India Programmers.io Full time

    Job Overview: We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies. Key Responsibilities: • Design...

  • AI/ML Architect

    4 weeks ago


    India Programmers.io Full time

    Job Overview: We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies. Key Responsibilities: • Design...

  • AI/ML Architect

    4 weeks ago


    India Programmers.io Full time

    Job Overview: We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies. Key Responsibilities: • Design and...

  • AI/ML Architect

    4 weeks ago


    India Programmers.io Full time

    Job Overview:We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies.Key Responsibilities:• Design and...


  • India Restored Cloud Full time

    At Restored Cloud, we are seeking an experienced Cloud Infrastructure Machine Learning Architect to design and build cutting-edge tools, frameworks, and systems for efficient machine learning model training, deployment, and scaling.The ideal candidate will have a strong background in cloud infrastructure, machine learning, and software development....


  • India Synaptyx AI Full time

    About SynaptyX AIWe're a forward-thinking company dedicated to delivering real impact with AI. Our mission is to help large and mid-sized enterprises unlock the power of Generative AI by providing tailored, scalable solutions that make a tangible difference.Our team combines decades of tech expertise with a startup-like mindset to solve complex challenges...


  • India Odin AI Full time

    We are seeking an experienced Cloud Infrastructure Architect and Deployment Engineer to join our dynamic team at Odin AI. This is a senior role that requires 4-6+ years of hands-on experience in managing cloud infrastructure and deployment pipelines.The ideal candidate should have a deep understanding of AWS, Google Cloud Platform (GCP), and Microsoft Azure,...


  • India Glacien AI Full time

    At Glacien.ai, we're a technology innovator focused on delivering transformative AI and cloud solutions across industries. Our platform combines sophisticated artificial intelligence with scalable cloud architecture to help businesses unlock new possibilities and drive innovation in their domains.The RoleWe're seeking an experienced Senior Full Stack...


  • india Programmers.io Full time

    Job Overview: We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies. Key Responsibilities: • Design and...


  • india Programmers.io Full time

    Job Overview:We are seeking an exceptional AI/ML Architect to lead our cutting-edge artificial intelligence and machine learning initiatives. The ideal candidate will be a visionary technologist capable of designing, implementing, and optimizing complex AI solutions, with a strong focus on generative AI technologies.Key Responsibilities:• Design and...

  • Senior technical lead

    2 months ago


    India Proximity Works Full time

    We are looking for a Senior Technical Lead / Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML...

  • AI/ML Enginners

    2 weeks ago


    India JUTEQ Inc Full time

    JUTEQ, a leading technology solutions provider specializing in cloud, Kubernetes, DevOps, and cutting-edge AI/ML solutions, is seeking talented AI/ML Engineers to join our team. We focus on empowering enterprises across industries such as financial services, telecommunications, and healthcare with innovative software products, platform integrations, and...

  • Senior Technical Lead

    2 months ago


    india Proximity Works Full time

    We are looking for a Senior Technical Lead / Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML...

  • Senior Technical Lead

    3 months ago


    India Proximity Works Full time

    We are looking for a Senior Technical Lead / Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML...

  • Senior Technical Lead

    3 months ago


    India Proximity Works Full time

    We are looking for a Senior Technical Lead / Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML...

  • Senior Technical Lead

    3 months ago


    india Proximity Works Full time

    We are looking for a Senior Technical Lead / Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML...