
Reliable High-Performance Computing Specialist
2 days ago
NVIDIA is driving AI and high-performance computing forward. DGX Cloud aims to deliver a fully managed AI platform on major cloud providers, optimizing AI workloads using high-performance NVIDIA infrastructure.
Key Responsibilities:- Build and implement operational and reliability aspects of large-scale Kubernetes clusters with focus on performance at scale, real-time monitoring, logging, and alerting.
- Define service level objectives and service level indicators, monitor error budgets, and streamline reporting.
- Support services before they launch through system creation consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Operate and optimize GPU workloads across AWS, GCP, Azure, OCI, and private clouds.
- Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
- Lead triage and root-cause analysis of high-severity incidents.
- Practice balanced incident response and blameless postmortems.
- Participate in on-call rotation to support production services.
Requirements:
- Strong understanding of Kubernetes and container orchestration.
- Experience with high-performance computing and AI workloads.
- Knowledge of real-time monitoring, logging, and alerting.
- Ability to define and implement service level objectives and indicators.
- Capacity management and launch review experience.
- Strong communication and collaboration skills.
- Ability to lead triage and root-cause analysis of high-severity incidents.
Benefits:
- Opportunity to work on cutting-edge AI and high-performance computing projects.
- Collaborative and dynamic work environment.
- Professional development opportunities.
- Competitive compensation package.
Additional Information:
- Must have strong problem-solving and analytical skills.
- Ability to adapt to changing priorities and deadlines.
- Strong team player with excellent communication skills.
-
High-Performance System Architect
2 days ago
Remote, India beBeeReliability Full time US$ 1,20,000 - US$ 1,60,000Site Reliability ExpertWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team. As a key member of the Professional Services Center of Excellence, you will play a crucial role in shaping Observability Engineering for our customers.Job DescriptionKey Responsibilities:Implement Observability solutions for customersDesign and...
-
High Performance Policy Developer
2 days ago
India beBeeSoftwareEngineer Full time ₹ 1,80,00,000 - ₹ 2,70,00,000Job OverviewWe're building a high-performance policy computation and storage layer to support pay policies across multiple businesses and regions.This ambitious project utilizes cutting-edge technologies from AWS and continuously pushes the boundaries of innovation.The team's charter is to develop a world-class product that meets attendance and pay...
-
India beBeeSoftwareEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job OpportunityThis is an exciting chance to be part of a leading-edge organization at the forefront of AI and data storage innovation. The company powers many demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, government, academia, research, and manufacturing.Main ResponsibilitiesCreate...
-
High-Performance System Specialist
2 days ago
India beBeesite Full time US$ 90,000 - US$ 1,20,000Job DescriptionWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems.You will work closely with development and operations teams to build and maintain infrastructure,...
-
Site Reliability Engineer
2 days ago
Remote, India Rackspace Technology Full timeJob DescriptionSite Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery - Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next...
-
High-Performance Network Specialist
1 hour ago
India beBeeNetwork Full time ₹ 10,00,000 - ₹ 20,00,000Network Infrastructure ExpertWe are seeking a skilled Network Infrastructure Expert to design, implement and maintain high-performance networks.Implementation experience: 3+Expertise in network architecture, protocols, and securityThe ideal candidate will have a strong understanding of network architecture, protocols, and security. They will be responsible...
-
High-Performance Software Engineer
17 hours ago
India beBeeBackend Full time US$ 80,000 - US$ 1,20,000Expertise AmplifiedWe are building a cutting-edge platform that leverages artificial domain intelligence to transform expertise. Our mission is to empower experts to harness AI without complexity, allowing them to focus on their core strengths.Role Overview:As a Backend Engineer, you will play a pivotal role in designing, developing, and maintaining...
-
High-Performance ASIC Design Expert
2 days ago
India beBeePhysicalDesign Full time ₹ 25,80,000 - ₹ 30,55,000Senior ASIC Design EngineerWe are seeking a skilled professional to fill the role of Senior ASIC Design Engineer. The ideal candidate will have extensive experience in physical design and a strong background in developing high-performance designs.The successful candidate will be responsible for leading the physical design efforts for next-generation ASICs,...
-
High-Performance Infrastructure Specialist
6 hours ago
India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 30,00,000We are looking for an experienced Infrastructure Engineer with a strong background in Kubernetes (K8s), GPU-based workloads, and scaling large distributed systems. The ideal candidate will have hands-on experience designing, building, and optimizing infrastructure to support large-scale, GPU-accelerated workloads.The successful candidate will be responsible...
-
India beBeeQuality Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Senior Quality Assurance SpecialistWe are seeking a highly experienced professional to lead quality assurance activities for trading platforms.This role will be focused on ensuring the quality, stability, and performance of trading systems, with a primary emphasis on Endur tool used for energy and commodities trading.Key ResponsibilitiesLead and execute...