Distributed systems engineer
3 weeks ago
Key Responsibilities:
Develop and scale distributed systems tailored for high-performance AI/ML workloads, focusing on eliminating delays caused by traditional checkpointing.
Design fault-tolerant and high-availability systems that ensure seamless operation and rapid recovery, even during infrastructure failures.
Implement advanced data partitioning, synchronization, and parallel computation techniques to handle terabytes of data and optimize memory usage across multi-node setups.
Collaborate with ML and infrastructure engineers to design innovative solutions for distributed training and inference of large-scale models.
Identify and resolve performance bottlenecks, particularly those arising from storage, memory, or network constraints in AI workflows.
Stay at the forefront of emerging distributed computing trends, such as zero-copy memory sharing, efficient in-memory data storage, and distributed model execution, to ensure your solutions remain cutting-edge.
Ability to adapt to new technologies and take on new responsibilities and roles in a fast-paced growing company.
Minimum Qualifications:
Bachelor's degree in Computer Science, Distributed Systems, Computer Engineering, or a related field.
5+ years of experience in designing and implementing distributed systems.
Proficiency in programming languages such as Python, C++, or Java.
Strong understanding of distributed computing principles, including fault tolerance, synchronization, and parallel computation.
Experience with distributed training frameworks such as Py Torch Distributed, Tensor Flow Distributed, or Deep Speed.
Familiarity with cloud platforms (AWS, GCP, or Azure) and managing multi-node infrastructure.
Demonstrated ability to troubleshoot performance bottlenecks in distributed systems.
Preferred Qualifications:
Master’s or Ph. D. in Computer Science, Distributed Systems, Computer Engineering, or a related field.
7+ years of hands-on experience with large-scale distributed systems for AI/ML workloads.
Expertise in advanced distributed systems concepts, such as zero-copy memory sharing, RDMA, and NVMe-based storage.
Experience working at Nvidia, AMD, AWS, or a similar distributed systems-focused organization.
Proven track record of optimizing distributed systems for AI/ML models with 1 B+ parameters.
Strong knowledge of network optimization techniques for high-performance computing.
Familiarity with cutting-edge AI/ML trends and the ability to integrate them into distributed architectures.
-
Delhi, India Persistent Systems Full timeAbout Position:We are on the lookout for a seasoned Cloud Database Administrator with a specialized focus on distributed database systems and a minimum of 5 years of experience. The successful candidate will be instrumental in managing, scaling, and ensuring the reliability of our distributed databases deployed on cloud infrastructure, with a particular...
-
Delhi, India Persistent Systems Full timeAbout Position:We are on the lookout for a seasoned Cloud Database Administrator with a specialized focus on distributed database systems and a minimum of 5 years of experience. The successful candidate will be instrumental in managing, scaling, and ensuring the reliability of our distributed databases deployed on cloud infrastructure, with a particular...
-
Distributed Systems Developer
3 weeks ago
Delhi, Delhi, India CIEL HR Full timeDistributed Systems Engineer RoleCIEL HR is looking for an experienced Distributed Systems Engineer with expertise in Erlang to join our IT product development team working on logistics and supply chain solutions. This role involves designing, developing, and maintaining scalable, distributed systems using Erlang.Responsibilities:Implement and maintain our...
-
Delhi, Delhi, India HiroJet Full timeJob SummaryHiroJet seeks a highly experienced Distributed Systems Engineer with expertise in Kafka to join our team. This is a senior-level position that requires a deep understanding of distributed systems, backend development, and technical leadership.About the RoleWe empower the people who power modern, digital business by enabling customers to deliver...
-
Distributed Systems Engineer
3 weeks ago
Delhi, Delhi, India CIEL HR Full time**Job Overview**CIEL HR is a rapidly growing company seeking a talented Senior Erlang Developer to lead the development of our next-generation logistics platform. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining scalable, reliable, and performant backend services and applications using...
-
Senior Distributed Systems Engineer
3 weeks ago
Delhi, Delhi, India ICICIDirect Full timeCompany OverviewICICIDirect is a leading financial services company that provides a range of innovative solutions to its customers.SalaryThe estimated salary for this position is ₹45,00,000 per annum, commensurate with experience and qualifications.Job DescriptionAs a Senior Distributed Systems Engineer at ICICIDirect, you will play a critical role in...
-
Data Processing Engineer for Distributed Systems
2 weeks ago
Delhi, Delhi, India Radioactive Technologies Full timeJob Description:Radioactive Technologies is seeking a skilled Data Processing Engineer to join our team. In this role, you will be responsible for designing and implementing data processing pipelines using Apache Spark and Scala.About the Role:The successful candidate will have a strong background in functional programming and experience with distributed...
-
Distributed Systems Architect
1 month ago
Delhi, Delhi, India StarTree Full timeAbout StarTree :StarTree is a pioneering cloud-based software company that empowers businesses to extract profound insights from real-time and historical data. With its innovative solutions, StarTree enables organizations to make informed decisions by leveraging the power of big data.Founding Story:StarTree was founded by the core software engineering team...
-
Software Engineering Expert
2 weeks ago
Delhi, Delhi, India Continuous Full timeJob Description:We are seeking a skilled Software Engineering Expert to join our team at Continuous Technologies. As a Senior/Staff Platform Software Engineer, you will be responsible for designing and developing back-end SaaS applications, object-oriented programming (OOP), distributed systems, and data modeling.The ideal candidate will have 8-12 years of...
-
Technical Leadership
3 weeks ago
Delhi, Delhi, India LinkedIn Full timeJob SummaryWe are seeking an experienced Technical Leadership & Distributed Systems Architect to join our team at LinkedIn. As a key member of our software engineering organization, you will be responsible for designing and architecting scalable, high-performance distributed systems that enable our products to operate 24/7.In this role, you will work closely...
-
Backend Software Engineer
3 weeks ago
Delhi, Delhi, India CIEL HR Full timeJob Description:CIEL HR is a leading provider of innovative IT solutions, and we are currently seeking an experienced Backend Software Engineer - Distributed Systems to join our dynamic team. As a key member of our development team, you will play a crucial role in designing, developing, and maintaining our core backend systems, ensuring high performance and...
-
Senior Distributed Systems Developer
3 weeks ago
Delhi, Delhi, India AiDASH Full timeAbout AiDashAiDash is making critical infrastructure industries climate-resilient and sustainable with satellites and AI. Using our full-stack SaaS solutions, customers in electric, gas, and water utilities, transportation, and construction are transforming asset inspection and maintenance - and complying with biodiversity net gain mandates and carbon...
-
Senior Distributed Systems Architect
4 weeks ago
Delhi, Delhi, India LinkedIn Full timeOverviewLinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce.We're committed to providing transformational opportunities for our employees by investing in their growth and creating a culture that's built on trust, care, inclusion, and fun, where everyone can succeed.Job...
-
Delhi, Delhi, India Continuous Full time**Company Overview**Continuous Technologies is the pioneering solution designed to help businesses launch and grow usage consumption pricing models on the Salesforce platform. Our Quote to Consumption solution seamlessly integrates with Salesforce, eliminating the need for costly, risky integrations to standalone billing systems or ongoing maintenance of...
-
Delhi, Delhi, India LinkedIn Full timeAbout the RoleWe are seeking a highly skilled Senior Distributed Systems Engineer to join our team as an Information Retrieval Expert. In this role, you will be responsible for designing and building high-performance distributed database systems for information retrieval applications.Company OverviewAt LinkedIn, we believe in creating economic opportunities...
-
Delhi, Delhi, India LinkedIn Full timeLead Software Engineer - Distributed SystemsWe are seeking an experienced software engineering leader to join our team in Bangalore, India.This role offers a hybrid work option, allowing you to work from home or commute to a LinkedIn office.You will be part of our world-class software engineering team, building the next-generation infrastructure and...
-
Senior Distributed Systems Architect, Bangalore
3 weeks ago
Delhi, Delhi, India LinkedIn Full timeAbout LinkedInAt LinkedIn, we're dedicated to creating economic opportunities for every member of the global workforce. Our products empower professionals to discover new opportunities, develop their skills, and build meaningful connections.We're looking for a talented Distributed Systems Architect to join our team in Bangalore. As a key member of our...
-
Delhi, Delhi, India LinkedIn Full timeWe're seeking a skilled Senior Software Architect to lead the development of our next-generation retrieval systems at LinkedIn. This role offers a unique opportunity to design and build high-performance, distributed platforms that power our search functionality.As a key member of our software engineering team, you will be responsible for architecting,...
-
Cloud Engineer for Distributed Systems
4 weeks ago
Delhi, Delhi, India Intuitive Full timeAt Intuitive, we are one of the fastest-growing Cloud & SDx solution and services companies, supporting enterprise customers on a global scale. Our team delivers measurable value and key business outcomes.We pride ourselves on partnering with leading enterprises and serving 200+ customers across various industry verticals. Our achievements include being...
-
Software Development Expert
4 weeks ago
Delhi, Delhi, India Health Catalyst Full timeAbout Us:Health Catalyst is a leading player in the dynamic healthcare software development space. Our mission is to improve healthcare performance, cost, and quality by developing innovative tools.The Role:We are seeking a skilled Java Software Engineer with experience in distributed systems to join our team. As a mid- to senior-level developer, you will be...