Site Reliability Engineering Manager

3 weeks ago


india CloudBees Full time

Job Title - Manager, Site Reliability Engineer

Location - Bangalore and Chennai

Year of Experience - 10+ Years


About CloudBees

CloudBees is the leading software delivery platform that enables enterprises to deliver scalable, compliant, and secure software, empowering developers to do their best work.


Seamlessly integrating into any hybrid and heterogeneous environment, CloudBees is more than a tool—it's a strategic partner in your cloud transformation journey, ensuring security, compliance, and operational efficiency while enhancing the developer experience across your entire software development lifecycle. It allows developers to bring and execute their code anywhere, providing greater flexibility and freedom through fast, self-serve, and secure workflows.


CloudBees supports organizations at every step of their DevSecOps journey, whether using Jenkins on-premise or transitioning software delivery to the cloud and wanting to accelerate their cloud transformation by years. CloudBees is helping customers build the future, today.


About the Role

As an SRE Manager at CloudBees, you will be an essential contributor to the development of our industry-leading software products. You'll work within the SaaS Platform team to manage, design, develop, and deliver high-quality solutions to achieve high availability and performance of our systems.


What You'll Do

  • Lead efforts to design, implement, and manage highly available, scalable, and fault-tolerant systems and services.
  • Drive the automation of processes, deployments, monitoring, and incident response to improve efficiency and reliability.
  • Collaborate with development teams to ensure the architecture and applications are designed with scalability, reliability and cost in mind.
  • Develop and maintain monitoring, alerting, and logging solutions to proactively identify and address performance issues and outages.
  • Participate in a follow the sun on-call rotations, responding to incidents, conducting post-incident reviews, and contributing to incident response improvements.
  • Analyze system performance data, identify bottlenecks, and recommend solutions to optimize performance and resource utilization.
  • Contribute to the design and implementation of disaster recovery strategies and backup solutions.
  • Mentor and provide guidance to junior SREs and other team members, fostering a culture of continuous learning and improvement.
  • Stay current with industry trends, emerging technologies, and best practices to drive innovation and improvements in system reliability.

Requirements

  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience).
  • 10 + years of experience with at least two years with leadership experience in Site Reliability Engineering or similar role, with a proven track record of managing complex systems in a production environment.
  • Proficiency in programming/scripting languages such as Go, Python, or similar.
  • Strong experience with cloud platforms (e.g., AWS, Google Cloud, Azure) and infrastructure-as-code tools (e.g., Terraform, Cloud Formation).
  • Solid knowledge of networking concepts, including load balancing, DNS, routing, and security.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, DataDog).
  • Strong problem-solving skills and the ability to troubleshoot complex issues under pressure.
  • Excellent communication and collaboration skills to work effectively across teams.
  • Experience with CI/CD pipelines and version control systems (e.g., Jenkins, GitHub actions).
  • Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
  • Possess a passion for reliability, through participation in architectural design.
  • Proven ability to lead and guide technical projects and initiatives.



  • india First American (India) Full time

    The Role: A SRE Manager is ultimately responsible for system reliability, developer productivity and reducing time to market by striving to reduce technical debt of the services your SRE team supports. We seek managers who are passionate about site reliability to influence and drive the strategic SRE mission. As a Site Reliability Engineering Manager...


  • india Cricbuzz.com Full time

    Site Reliability Engineer We are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services. Experience - 3 - 5 years Responsibilities: ●...


  • india Greenway Health Full time

    Job Description Job Summary The Manager is responsible for implementing the development process and site reliability engineering practices to resolve issues and identify opportunity areas. This role will lead development and site reliability engineering teams and establish and implement best practices and standards related to engineering...


  • india SID Global Solutions Full time

    Dear Candidates, We are looking for immediate joiners 8 to 9 years for Hyderabad Location for a talented Site Reliability Engineer-Manager to join our dynamic team and contribute to the development of our cutting-edge web applications. If you're passionate about the role and have experience in SRE, GCP and Kubernetes , send me your updated cv : Please...


  • india Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer Job Description: Summary: As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • india Korn Ferry Full time

    Role - Site Reliability Engineer Exp - 5+ years Required Location - Hyderabad ( Work from Office-Hybrid) Shift Timings - 5AM -1 PM IST We are looking for a Site Reliability Engineer with strong development background to join our team. In this role, you will be responsible for ensuring the reliability and performance of our systems. You will work closely...


  • india Thoucentric Full time

    Job Description Job Description:We are seeking a skilled and dedicated Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, performance, and scalability of our systems and applications. This role combines software development and systems engineering to build and run large-scale, distributed,...


  • India System Soft Technologies Full time

    Title: Site Reliability Engineer100% REMOTEThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • india System Soft Technologies Full time

    Title: Site Reliability Engineer 100% REMOTE The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • india ViewSonic Full time

    Job Requirements: Bachelor’s degree in computer science, Engineering, or a related field. 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Proficient in AWS solutions including but not limited to EC2, S3, CloudWatch, Lambda, and RDS. Strong understanding of Platform Engineering concepts and principles. Experience...


  • india WaferWire Cloud Technologies Full time

    Role: SRE (Site Reliability Engineer) Experience: 4+ Years About WaferWire Cloud Technologies: WaferWire Cloud Technologies is a leading provider of innovative cloud solutions aimed at transforming businesses and driving digital growth. With a focus on cutting-edge technology and customer-centric approaches, we empower organizations to thrive in the...


  • india HCLSoftware Full time

    The Role: HCL BigFix is looking for a Site Reliability Engineer to work on infrastructure for a new product that will help keep our customers’ end points secure. You will be a part of a team that leverages modern technological solutions to drive growth and efficiency. Your daily responsibilities will be centered on HCL BigFix’s cloud infrastructure,...


  • india Encora Inc. Full time

    Description Sr. Software Engineer (Site Reliability Engineer) Important Information Location: Ahmedabad Experience: 5+ years Job Mode: Full-time Work Mode: Remote Job Summary Working with DevOps SRE with good experience in Site Reliability Engineer. Responsibilities and Duties Design, implement, and maintain highly...


  • india STAFIDE Full time

    Job Description About us: Stafide is the premier destination for tech talent consulting, providing comprehensive employment services throughout Europe. Our mission is straightforward: to effortlessly connect job seekers with employers, focusing on the rapidly changing technology sector. Boasting unparalleled expertise and a steadfast commitment, we...


  • india UBS Full time

    Your role We're looking for a Site Reliability Engineer to:• work as a part of an agile pod (team)• determine the reliability of our digital products, technology services, and the infrastructure that underpins them• minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or...


  • india RapidBraiins Full time

    Job Description : We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • india Coforge Full time

    Qualifications : Experience in a DevOps / Site Reliability Engineer ( SRE ) position, dedicated to ensuring the high availability, reliability, and scalability of live systems. Proficient in observability tools like Prometheus, ELK stack, Grafana, and Azure Monitor, capable of fully managing the suite for optimal system oversight. Skilled in operating APM...


  • india Hansen Technologies Full time

    About The Role If you are an experienced Site Reliability Engineer join our team in Pune location to become a driving force in ensuring the reliability, performance, and scalability of our systems. As an SRE, you'll be more than just a technical expert, you’ll be a creative problem solver with exceptional customer relationship skills. Your primary...


  • India System Soft Technologies Full time

    Job Summary The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams....


  • India System Soft Technologies Full time

    Job SummaryThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams....