Senior Cloud Site Reliability Engineer

4 days ago

Bengaluru, India Athenahealth Full time

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. athenahealth is a progressive & innovative U.S. health-tech leader , delivering cloud-based solutions that improve clinical and financial performance across the care continuum. Our modern, open ecosystem connects care teams and delivers actionable insights that drive better outcomes. Acquired by Bain Capital in a $17B deal. We foster a values-driven culture focused on flexibility, collaboration, and work-life balance. Headquartered in Boston , we have offices in Atlanta, Austin, Belfast, Burlington, and in India: Bangalore, Chennai and Pune.Position Summary: We are looking for a Senior Site Reliability Engineer to join our Cloud Infrastructure Engineering division in Bangalore . This team ensures the continuous availability of the technologies and systems that form the foundation of athenahealth’s services. We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second—all while supporting rapid growth. Our mission is to enable an operating system for the medical office that abstracts away administrative complexity, allowing doctors to focus on practicing medicine. About You : You’re a seasoned engineer with a passion for solving reliability and scalability challenges. You’re curious, collaborative, and driven to improve systems. You enjoy uncovering inefficiencies, automating solutions, and striving for operational excellence. You’re a fast learner, an excellent communicator, and a champion of engineering best practices. The ideal candidate will have strong expertise in AWS and Kubernetes , along with hands-on experience in Terraform, CI/CD pipelines, and scripting (e.g., Python, Bash, Go). Experience with AI tools such as Windsurf, GitHub Copilot , or similar will be considered a plus . The Team : We are a team of Site Reliability Engineers passionate about reliability, automation, and scalability. We follow an agile framework to prioritize high-impact work. Supporting both private and public cloud environments, we make data-driven decisions to choose the best fit for the business. We relentlessly automate manual tasks to focus on strategic initiatives. Key Responsibilities Reliability & Availability Define, measure, and maintain SLOs and SLIs for cloud services and infrastructure. Lead efforts to improve system availability, fault tolerance, and disaster recovery. Ensure proactive incident detection, root cause analysis, and timely resolution. Participate in a 24x7 on-call rotation. Automation & Infrastructure as Code (IaC) Drive automation to reduce manual intervention in cloud infrastructure management. Implement IaC using tools like Terraform , AWS CloudFormation, and Ansible. Automate deployment, scaling, and monitoring processes. Monitoring, Observability & Performance Design and implement monitoring, logging, and alerting solutions. Use observability tools (e.g., Prometheus, Grafana, CloudWatch) for performance insights. Identify and resolve performance bottlenecks. Security & Compliance Build cloud infrastructure with security best practices and compliance in mind. Collaborate with security teams to implement controls and mitigate risks. Conduct regular audits for vulnerabilities and compliance gaps. Collaboration & Leadership Partner with development, DevOps, and operations teams to align infrastructure with business needs. Mentor junior engineers and promote a culture of operational excellence. Serve as a technical point of contact for infrastructure-related issues. Incident Management & Post-Mortems Lead incident response for cloud infrastructure issues. Conduct post-incident reviews and implement preventive measures. Continuously improve incident management processes. Qualifications 5–9 years of hands-on experience with cloud automation and configuration tools (e.g., Terraform , CloudFormation, Ansible) in a hybrid cloud setup. 4+ years in SRE, Infrastructure Engineering, or DevOps roles Deep expertise in AWS services (e.g., EC2, S3, Lambda) and Kubernetes . Proficiency in scripting/programming (e.g., Python , Go, Bash). Experience with observability tools (e.g., Prometheus, Grafana, Datadog, ELK). Familiarity with CI/CD pipelines and cloud-native development practices. Strong experience managing production environments in AWS, GCP, or Azure. Knowledge of cloud-native architectures, microservices, and containerization (Kubernetes, Docker). Proven ability to build scalable, fault-tolerant systems. Solid understanding of cloud networking, storage, compute, and security best practices. Bonus: Experience with AI tools such as Windsurf , GitHub Copilot , or similar. About athenahealth Our vision: In an industry that becomes more complex by the day, we stand for simplicity. We offer IT solutions and expert services that eliminate the daily hurdles preventing healthcare providers from focusing entirely on their patients — powered by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. Our company culture: Our talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our vision. We are a diverse group of dreamers and do-ers with unique knowledge, expertise, backgrounds, and perspectives. We unite as mission-driven problem-solvers with a deep desire to achieve our vision and make our time here count. Our award-winning culture is built around shared values of inclusiveness, accountability, and support. Our DEI commitment: Our vision of accessible, high-quality, and sustainable healthcare for all requires addressing the inequities that stand in the way. That's one reason we prioritize diversity, equity, and inclusion in every aspect of our business, from attracting and sustaining a diverse workforce to maintaining an inclusive environment for athenistas, our partners, customers and the communities where we work and serve. What we can do for you: Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs. We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation. In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. We provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued. Learn more about our culture and benefits here: athenahealth.com/careers

Site Reliability Engineer

3 weeks ago

Bengaluru, India WhiteLotus Talent Partners Full time

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
Site reliability engineer

3 weeks ago

Bengaluru, India WhiteLotus Talent Partners Full time

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by Open Stack and Kubernetes. In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system...
Senior Site Reliability Engineer

2 days ago

Bengaluru, Karnataka, India Quantaleap Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Job Title: Senior Site Reliability EngineerLocation: Remote (occasional travel for team meetings)Experience Required: 5+ YearsDomain: Release Engineering / SRE / DevOpsRole OverviewWe are seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our systems. The role requires strong expertise in...
Site Reliability Engineer

2 weeks ago

Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time ₹ 9,00,000 - ₹ 12,00,000 per year

We are looking for aL0 and L1 Site Reliability Engineer (SRE) Supportto join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered byOpenStackandKubernetes. In this role, you will focus onmonitoring,basic troubleshooting, andincident response, helping to maintain high system availability,...
Senior Site Reliability Engineer

6 days ago

Bengaluru, Karnataka, India Zorba Consulting Full time ₹ 8,00,000 - ₹ 16,00,000 per year

Description :Senior Site Reliability Engineer (SRE)Location : Bangalore, IndiaExperience : 6+ YearsDomain : DevOps/Cloud InfrastructureAbout the Role :We are looking for a Senior Site Reliability Engineer (SRE) to join our core engineering team.You will be instrumental in ensuring the reliability, scalability, and performance of our global microservices...
Senior Site Reliability Engineer

3 weeks ago

Bengaluru, India Tata Consultancy Services Full time

Role**: Senior Site Reliability Engineer (SRE) Required Technical Skill Set: Senior Site Reliability Engineer (SRE) Desired Experience Range: 7 - 10 yrs Notice Period: Immediate to 90Days only Location of Requirement: Bangalore We are currently planning to do a Virtual Interview Job Description: Key Responsibilities Infrastructure & Application Support -...
Senior Site Reliability Engineer

3 weeks ago

Bengaluru, India Whatjobs IN C2 Full time

Role**: Senior Site Reliability Engineer (SRE) Required Technical Skill Set: Senior Site Reliability Engineer (SRE) Desired Experience Range: 7 - 10 yrs Notice Period: Immediate to 90Days only Location of Requirement: Bangalore We are currently planning to do a Virtual Interview Job Description: Key Responsibilities Infrastructure & Application Support...
Senior Staff Site Reliability Engineer

4 weeks ago

Bengaluru, India Movius Full time

Senior Staff Site Reliability Engineer Location: Bengaluru, KA, Job Description: We are seeking a highly skilled Senior Staff Site Reliability Engineer with extensive experience in DevOps/SRE roles and large-scale distributed systems. The ideal candidate will have a proven background in cloud operations, automation, and CI/CD, with a preference for...
Senior Staff Site Reliability Engineer

4 weeks ago

Bengaluru, India Movius Full time

Senior Staff Site Reliability Engineer Location: Bengaluru, KA, Job Description: We are seeking a highly skilled Senior Staff Site Reliability Engineer with extensive experience in DevOps/SRE roles and large-scale distributed systems. The ideal candidate will have a proven background in cloud operations, automation, and CI/CD, with a preference for...
Senior Cloud Engineer

2 weeks ago

Bengaluru, Karnataka, India Cloud Software Group Full time ₹ 6,00,000 - ₹ 18,00,000 per year

We need you to:Own our SQL database systems and meet critical quality, availability, performance and reliability goalsDesign and build the tools, frameworks, systems, and processes related to the operation of our databases and integration with Site Reliability capabilitiesWork with the development teams to design scalable, robust systems using cloud...

Americas

Europe

Asia / Oceania

Africa

Senior Cloud Site Reliability Engineer