GPU Compute Cluster SRE
1 week ago
We are seeking a highly skilled GPU Compute Cluster SRE to join our team at NVIDIA. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale production systems with high efficiency and availability.
Farm GPU compute cluster SRE works to maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline that demands knowledge across different systems, Slurm/LSF, Unix administration, scripting, capacity management, and opensource technologies. Farm GPU SRE is responsible for developing the solution around our large compute cluster to make it work efficiently and improve the user experience for customer as well as engineers supporting the cluster.
Key Responsibilities:
- Design, implement, and support large-scale infrastructure with monitoring, logging, and alerting with promised uptime.
- Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management.
- Maintain infra and services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
Requirements:
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics) or equivalent.
- 3+ years of hands-on industry experience in the above-mentioned areas.
- Must have experience with Linux system administration (Ubuntu, Centos/Redhat).
- Must have HPC cluster scheduler experience in setup and administration like SLURM & LSF.
- Experience in one or more of the following: Python, Perl, Bash.
- Good understanding of open-source IT Automation tools like Ansible.
- Interest in crafting, analyzing, and fixing large-scale distributed systems.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Preferred Qualifications:
- Experience of Bright Cluster Manager (BCM).
- Understanding on InfiniBand or Ethernet concepts.
- Experience with high-speed storage solutions such as Lustre, GPFS.
- Experience with MPI, Pytorch.
-
GPU Compute Cluster SRE
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeJob SummaryWe are seeking a highly skilled GPU Compute Cluster SRE to join our team at NVIDIA. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale infrastructure with monitoring, logging, and alerting with promised uptime.Key ResponsibilitiesDesign and implement large-scale...
-
GPU Compute Cluster SRE Engineer
6 days ago
Bengaluru, Karnataka, India NVIDIA Full timeJob SummaryNVIDIA is seeking a highly skilled GPU Compute Cluster SRE Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale production systems with high efficiency and availability.Key ResponsibilitiesDesign and implement large-scale infrastructure with...
-
GPU Compute Cluster Engineer
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeJob Title: GPU Compute Cluster EngineerNVIDIA is seeking a highly skilled GPU Compute Cluster Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale infrastructure with monitoring, logging, and alerting with promised uptime.Key Responsibilities:Design and...
-
GPU Architect
2 weeks ago
Bengaluru, Karnataka, India Synopsys Inc Full timeJob OverviewThe GPU Architect will lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC software solutions. They will define HPC solutions that effectively utilize current hardware but prepare us for cutting-edge GPU architectures on the horizon that will impact our industry.Key Responsibilities:Design and...
-
GPU Architect
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA's Innovation in GPU TechnologyNVIDIA's pioneering work in the GPU has revolutionized modern computer graphics and parallel computing. As a key player in the AI computing landscape, we're seeking a talented individual to join our team and contribute to the advancement of GPU architecture.Key Responsibilities:Collaborate with cross-functional teams to...
-
GPU Architect
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is at the forefront of technological advancement, driving innovation in the field of artificial intelligence and computer graphics. As a GPU Architect, you will play a crucial role in shaping the future of computing.Key Responsibilities:Contribute to the development and enhancement of GPU architecture and simulators, testing infrastructure, metrics,...
-
GPU Architect
1 week ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is a leader in the field of computer graphics and parallel computing. We are seeking a highly skilled GPU Architect to join our team and contribute to the development of our GPU architecture.Key Responsibilities:Contribute to advancing GPU Architecture and Simulators, GPU testing infrastructure, metrics, and/or compilers.Develop and enhance various...
-
GPU Architect
6 days ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is a leader in the field of computer graphics and parallel computing. We're looking for a talented individual to join our team as a GPU Architect. In this role, you'll be responsible for developing and enhancing various features in the GPU architecture.Key Responsibilities:Contribute to advancing GPU Architecture and Simulators, GPU testing...
-
GPU Software Engineer
2 weeks ago
Bengaluru, Karnataka, India Synopsys Inc Full timeJob OverviewWe are seeking a highly skilled GPU Staff/Senior Staff role to optimize and implement GPU-accelerated algorithms for OPC software in the EDA industry.This position emphasizes performance improvements and integration with existing EDA tools, requiring close peer and partner collaborations to deliver solutions at the right time that address the...
-
GPU Architect Lead
3 days ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is seeking a highly skilled GPU Architect Lead to contribute to the advancement of our GPU architecture and simulators. As a key member of our team, you will be responsible for developing and enhancing various features in the GPU architecture. Your expertise in C++, parallel processing, and compiler development will be invaluable in helping us push...
-
GPU Technical Lead
4 hours ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is seeking an experienced GPU expert to drive innovation in our GPU architecture. As a GPU Technical Lead, you will be responsible for advancing various features in the GPU architecture.Key Responsibilities:Contribute to the development and enhancement of GPU architecture and simulators.Design and implement features for future graphics and parallel...
-
GPU Verification Engineer
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA GPU Verification Engineer Job DescriptionNVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of the next generation of GPUs. This role offers the opportunity to have a real impact in a technology-focused company that is pushing the frontiers of what is possible today and defining the platform for the future of...
-
GPU Verification Engineer
4 hours ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of cutting-edge GPUs. This role offers the opportunity to make a real impact in a technology-driven company with a global presence. Our team is passionate about parallel and visual computing, and we're united in our quest to transform the way graphics are used to solve...
-
Senior Cluster Networking Engineer
4 hours ago
Bengaluru, Karnataka, India Oracle Full timeAs a Senior Cluster Networking Engineer at Oracle, you will play a key role in designing and operating the network infrastructure required to run distributed AI workloads across a cluster of thousands of GPUs. Our team is responsible for provisioning, securing, and scaling the network stack to meet the needs of our customers. We are looking for adaptable...
-
GPU Engineer
2 weeks ago
Bengaluru, Karnataka, India Synopsys Inc Full timeJob OverviewWe are seeking a highly skilled GPU Staff/Senior Staff role to focus on optimizing and implementing GPU-accelerated algorithms for OPC software in the EDA industry.This position emphasizes performance improvements and integration with existing EDA tools, requiring close peer and partner collaborations to ensure timely delivery of solutions...
-
GPU Development Engineer
2 weeks ago
Bengaluru, Karnataka, India Intel Full timeJob Title: GPU Development EngineerJob Summary:We are seeking a talented GPU Development Engineer to join our team at Intel. As a key member of our software team, you will be responsible for developing innovative software solutions to accelerate media and video processing on Intel's graphics architecture.Key Responsibilities:Developing new software solutions...
-
GPU Performance Architect
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeAbout NVIDIANVIDIA is a pioneering technology company that has revolutionized the world of computing. With a rich history of innovation, we have consistently pushed the boundaries of what is possible. Our journey began with the invention of the GPU in 1999, which sparked the growth of the PC gaming market and redefined modern computer graphics. Today, we are...
-
GPU Verification Engineer
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeGPU Verification EngineerNVIDIA is seeking a talented Verification Engineer to join our team and contribute to the design and implementation of cutting-edge GPUs. As a key member of our ASIC Verification team, you will play a crucial role in verifying the correctness of our industry-leading GPUs.Key Responsibilities:Verify the design and implementation of...
-
GPU Performance Architect
2 weeks ago
Bengaluru, Karnataka, India NVIDIA Full timeGPU Performance Analysis ExpertNVIDIA is a pioneer in the field of visual processing, high-performance computing, and artificial intelligence. We are seeking a highly motivated and creative engineer to join our HW architecture team, where you will work on projects that will shape the future of visual computing, automotive, and GPU systems.Key...
-
GPU Verification Engineer
1 week ago
Bengaluru, Karnataka, India NVIDIA Full timeNVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of the next generation of industry-leading GPUs. This position offers the opportunity to have a real impact in a multifaceted, technology-focused company impacting product lines ranging from consumer graphics to self-driving cars and the growing field of artificial...