Reliable High-Performance Computing Specialist

2 days ago


Remote India beBeeReliability Full time ₹ 1,80,00,000 - ₹ 2,00,00,000
High-Performance Computing Engineer

NVIDIA is driving AI and high-performance computing forward. DGX Cloud aims to deliver a fully managed AI platform on major cloud providers, optimizing AI workloads using high-performance NVIDIA infrastructure.

Key Responsibilities:
  • Build and implement operational and reliability aspects of large-scale Kubernetes clusters with focus on performance at scale, real-time monitoring, logging, and alerting.
  • Define service level objectives and service level indicators, monitor error budgets, and streamline reporting.
  • Support services before they launch through system creation consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Operate and optimize GPU workloads across AWS, GCP, Azure, OCI, and private clouds.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Lead triage and root-cause analysis of high-severity incidents.
  • Practice balanced incident response and blameless postmortems.
  • Participate in on-call rotation to support production services.

Requirements:

  • Strong understanding of Kubernetes and container orchestration.
  • Experience with high-performance computing and AI workloads.
  • Knowledge of real-time monitoring, logging, and alerting.
  • Ability to define and implement service level objectives and indicators.
  • Capacity management and launch review experience.
  • Strong communication and collaboration skills.
  • Ability to lead triage and root-cause analysis of high-severity incidents.

Benefits:

  • Opportunity to work on cutting-edge AI and high-performance computing projects.
  • Collaborative and dynamic work environment.
  • Professional development opportunities.
  • Competitive compensation package.

Additional Information:

  • Must have strong problem-solving and analytical skills.
  • Ability to adapt to changing priorities and deadlines.
  • Strong team player with excellent communication skills.


  • Remote, India beBeeReliability Full time US$ 1,20,000 - US$ 1,60,000

    Site Reliability ExpertWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team. As a key member of the Professional Services Center of Excellence, you will play a crucial role in shaping Observability Engineering for our customers.Job DescriptionKey Responsibilities:Implement Observability solutions for customersDesign and...


  • India beBeeSoftwareEngineer Full time ₹ 1,80,00,000 - ₹ 2,70,00,000

    Job OverviewWe're building a high-performance policy computation and storage layer to support pay policies across multiple businesses and regions.This ambitious project utilizes cutting-edge technologies from AWS and continuously pushes the boundaries of innovation.The team's charter is to develop a world-class product that meets attendance and pay...


  • India beBeeSoftwareEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job OpportunityThis is an exciting chance to be part of a leading-edge organization at the forefront of AI and data storage innovation. The company powers many demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, government, academia, research, and manufacturing.Main ResponsibilitiesCreate...


  • India beBeesite Full time US$ 90,000 - US$ 1,20,000

    Job DescriptionWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems.You will work closely with development and operations teams to build and maintain infrastructure,...


  • Remote, India Rackspace Technology Full time

    Job DescriptionSite Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery - Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next...


  • India beBeeNetwork Full time ₹ 10,00,000 - ₹ 20,00,000

    Network Infrastructure ExpertWe are seeking a skilled Network Infrastructure Expert to design, implement and maintain high-performance networks.Implementation experience: 3+Expertise in network architecture, protocols, and securityThe ideal candidate will have a strong understanding of network architecture, protocols, and security. They will be responsible...


  • India beBeeBackend Full time US$ 80,000 - US$ 1,20,000

    Expertise AmplifiedWe are building a cutting-edge platform that leverages artificial domain intelligence to transform expertise. Our mission is to empower experts to harness AI without complexity, allowing them to focus on their core strengths.Role Overview:As a Backend Engineer, you will play a pivotal role in designing, developing, and maintaining...


  • India beBeePhysicalDesign Full time ₹ 25,80,000 - ₹ 30,55,000

    Senior ASIC Design EngineerWe are seeking a skilled professional to fill the role of Senior ASIC Design Engineer. The ideal candidate will have extensive experience in physical design and a strong background in developing high-performance designs.The successful candidate will be responsible for leading the physical design efforts for next-generation ASICs,...


  • India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 30,00,000

    We are looking for an experienced Infrastructure Engineer with a strong background in Kubernetes (K8s), GPU-based workloads, and scaling large distributed systems. The ideal candidate will have hands-on experience designing, building, and optimizing infrastructure to support large-scale, GPU-accelerated workloads.The successful candidate will be responsible...


  • India beBeeQuality Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Senior Quality Assurance SpecialistWe are seeking a highly experienced professional to lead quality assurance activities for trading platforms.This role will be focused on ensuring the quality, stability, and performance of trading systems, with a primary emphasis on Endur tool used for energy and commodities trading.Key ResponsibilitiesLead and execute...