
High-Performance Infrastructure Specialist
1 week ago
Senior Site Reliability Engineer
Overview:
The successful candidate will be responsible for designing and implementing large-scale distributed systems with a focus on performance at scale, real-time monitoring, logging, and alerting. The ideal candidate will have a deep understanding of GPU computing and AI infrastructure.
Responsibilities:
- Design and implement state-of-the-art GPU compute clusters.
- Optimize cluster operations for maximum reliability, efficiency, and performance.
- Drive foundational improvements and automation to enhance researcher productivity.
- Troubleshoot, diagnose, and root cause system failures and isolate the components/failure scenarios while working with internal & external partners.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems and Be part of an on-call rotation to support production systems.
- Write and review code, develop documentation and capacity plans, debug the hardest problems, live, on some of the largest and most complex systems in the world.
Requirements:
- Bachelor's degree in computer science, Electrical Engineering or related field or equivalent experience with a minimum 5+ years of experience designing and operating large scale compute infrastructure.
- Proven experience in site reliability engineering for high-performance computing environments with operational experience of at least 2K GPUs cluster.
- Deep understanding of GPU computing and AI infrastructure.
- Passion for solving complex technical challenges and optimizing system performance.
- Experience with AI/HPC advanced job schedulers, and ideally familiarity with schedulers such as Slurm.
- Working knowledge of cluster configuration management tools such as BCM or Ansible and infrastructure level applications, such as Kubernetes, Terraform, MySQL, etc.
- In-depth understanding of container technologies like Docker, Enroot, etc.
- Experience programming in Python and Bash scripting.
Benefits:
- Opportunity to work on cutting-edge technology and contribute to groundbreaking projects.
- Collaborative and dynamic work environment with talented professionals.
- Professional development and growth opportunities.
- Competitive salary and benefits package.
-
Gurgaon, Haryana, India beBeeInfrastructure Full time ₹ 15,00,000 - ₹ 28,00,000Job DescriptionWe are seeking an experienced professional to fill the role of High-Performance Computing Engineer. The successful candidate will provide operational support for enterprise-level customers, planning and performing maintenance activities, assessing customer environments for performance and design issues, and collaborating with technical teams...
-
Gurgaon, Haryana, India beBeeNetwork Full time ₹ 15,00,000 - ₹ 28,00,000Expert Network Professionals are sought for the role of High-Performance Computing Network Engineer.This position requires a highly skilled individual with extensive experience in managing Network infrastructure in high-performance computing environments. The ideal candidate will have expertise in configuring, maintaining, and troubleshooting Nvidia/Mellanox...
-
High-Performance Messaging Systems Specialist
2 weeks ago
Gurgaon, Haryana, India beBeeKafka Full time ₹ 15,00,000 - ₹ 25,00,000Job Title : High-Performance Messaging Systems SpecialistWe are seeking an experienced Kafka Administrator to manage, maintain, and optimize our distributed, multi-cluster Kafka infrastructure deployed in an on-premise environment. This role requires deep knowledge of Kafka internals, Zookeeper administration, performance tuning, and operational excellence...
-
High-Performance Computing Specialist
1 week ago
Gurgaon, Haryana, India beBeeInfrastructure Full time ₹ 15,00,000 - ₹ 28,00,000Job Overview:We are seeking a talented HPC Infrastructure Specialist to join our team. In this role, you will provide expert-level operational support to customers for incident, problem, and change management activities.Key Responsibilities:Provide enterprise-level operational support to customers for incident, problem, and change management activitiesPlan...
-
Gurgaon, Haryana, India beBeeMarketing Full time ₹ 12,00,000 - ₹ 15,00,000Job Title:A high-performing Performance Marketing Specialist is required to plan and execute Paid Advertising campaigns on social media platforms.Key Responsibilities:Developing and implementing Paid Advertising strategies across multiple social media channelsOptimizing campaign performance through A/B testing and ROI analysisManaging programmatic...
-
System Infrastructure Specialist
1 week ago
Gurgaon, Haryana, India beBeeinfrastructure Full time ₹ 1,50,000 - ₹ 28,00,000System Infrastructure SpecialistWe are seeking an experienced System Infrastructure Specialist to join our team. As a key member of our infrastructure team, you will be responsible for the management and maintenance of high availability infrastructure.
-
Cloud Infrastructure Specialist
2 weeks ago
Gurgaon, Haryana, India beBeeDevOps Full time ₹ 20,00,000 - ₹ 25,00,000AWS DevOps Engineer - Cloud Infrastructure SpecialistWe are seeking a seasoned cloud infrastructure specialist with robust experience in designing, deploying, and maintaining secure, scalable, and high-availability AWS environments.Design and manage AWS infrastructure, focusing on middleware services such as API Gateway, Lambda, SQS, SNS, ECS, and...
-
High-Performance System Specialist
7 days ago
Gurgaon, Haryana, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job OverviewWe are seeking an experienced Senior Reliability Engineer to ensure the reliability, availability, scalability, and performance of our Azure-based platforms and applications.Service Reliability & SLOs: Define and maintain Service Level Objectives (SLOs) for the systems you own.Automation & Scalability: Develop automation to scale systems...
-
Cloud Infrastructure Specialist
1 week ago
Gurgaon, Haryana, India beBeeCloudInfrastructure Full time ₹ 20,09,917 - ₹ 25,12,756Job Title:Cloud Infrastructure SpecialistAbout the Role:This is an exciting opportunity to join our team as a Cloud Infrastructure Specialist. In this role, you will be responsible for designing, building, testing, and deploying cloud application solutions that integrate cloud and non-cloud infrastructure.Your primary focus will be on collaborating with...
-
High-Performance Database Specialist
3 days ago
Gurgaon, Haryana, India beBeeDatabaseAdministrator Full time ₹ 15,00,000 - ₹ 25,00,000Job SummaryWe are seeking a seasoned Database Administrator to join our team. As a key member of the database administration team, you will be responsible for designing, implementing, and maintaining high-performance database systems.Responsibilities include:Monitoring and troubleshooting database instancesPerforming root-cause analysis in response to...