[Only 24h Left] Senior Cloud Site Reliability Engineer

6 days ago


Bengaluru India Athenahealth Technology Private Limited Full time

Job Description

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

athenahealth is a progressive & innovative U.S. health-tech leader, delivering cloud-based solutions that improve clinical and financial performance across the care continuum. Our modern, open ecosystem connects care teams and delivers actionable insights that drive better outcomes. Acquired by Bain Capital in a $17B deal. We foster a values-driven culture focused on flexibility, collaboration, and work-life balance.

Headquartered in Boston, we have offices in Atlanta, Austin, Belfast, Burlington, and in India: Bangalore, Chennai and Pune.

Position Summary: We are looking for aSenior Site Reliability Engineerto join ourCloud Infrastructure Engineeringdivision in Bangalore. This team ensures the continuous availability of the technologies and systems that form the foundation of athenahealth's services.

We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second-all while supporting rapid growth. Our mission is to enable an operating system for the medical office that abstracts away administrative complexity, allowing doctors to focus on practicing medicine.

About You: You're a seasoned engineer with a passion for solving reliability and scalability challenges. You're curious, collaborative, and driven to improve systems. You enjoy uncovering inefficiencies, automating solutions, and striving for operational excellence. You're a fast learner, an excellent communicator, and a champion of engineering best practices.

The ideal candidate will havestrong expertise in AWS and Kubernetes, along with hands-on experience inTerraform, CI/CD pipelines, and scripting(e.g., Python, Bash, Go).Experience with AI toolssuch asWindsurf, GitHub Copilot, or similar will be considered aplus.

The Team: We are a team of Site Reliability Engineers passionate about reliability, automation, and scalability. We follow an agile framework to prioritize high-impact work. Supporting both private and public cloud environments, we make data-driven decisions to choose the best fit for the business. We relentlessly automate manual tasks to focus on strategic initiatives.

Key Responsibilities

Reliability & Availability

- Define, measure, and maintain SLOs and SLIs for cloud services and infrastructure.
- Lead efforts to improve system availability, fault tolerance, and disaster recovery.
- Ensure proactive incident detection, root cause analysis, and timely resolution.
- Participate in a 24x7 on-call rotation.

Automation & Infrastructure as Code (IaC)

- Drive automation to reduce manual intervention in cloud infrastructure management.
- Implement IaC using tools likeTerraform, AWS CloudFormation, and Ansible.
- Automate deployment, scaling, and monitoring processes.

Monitoring, Observability & Performance

- Design and implement monitoring, logging, and alerting solutions.
- Use observability tools (e.g., Prometheus, Grafana, CloudWatch) for performance insights.
- Identify and resolve performance bottlenecks.

Security & Compliance

- Build cloud infrastructure with security best practices and compliance in mind.
- Collaborate with security teams to implement controls and mitigate risks.
- Conduct regular audits for vulnerabilities and compliance gaps.

Collaboration & Leadership

- Partner with development, DevOps, and operations teams to align infrastructure with business needs.
- Mentor junior engineers and promote a culture of operational excellence.
- Serve as a technical point of contact for infrastructure-related issues.

Incident Management & Post-Mortems

- Lead incident response for cloud infrastructure issues.
- Conduct post-incident reviews and implement preventive measures.
- Continuously improve incident management processes.

Qualifications

- 5-9 years of hands-on experience with cloud automation and configuration tools (e.g.,Terraform, CloudFormation, Ansible) in a hybrid cloud setup.
- 4+ years in SRE, Infrastructure Engineering, or DevOps roles
- Deep expertise inAWSservices (e.g., EC2, S3, Lambda) andKubernetes.
- Proficiency in scripting/programming (e.g.,Python, Go, Bash).
- Experience with observability tools (e.g., Prometheus, Grafana, Datadog, ELK).
- Familiarity withCI/CD pipelinesand cloud-native development practices.
- Strong experience managing production environments in AWS, GCP, or Azure.
- Knowledge of cloud-native architectures, microservices, and containerization (Kubernetes, Docker).
- Proven ability to build scalable, fault-tolerant systems.
- Solid understanding of cloud networking, storage, compute, and security best practices.
- Bonus:Experience with AI tools such asWindsurf,GitHub Copilot, or similar.

-



  • Bengaluru, India Xebia Full time

    Performance & Reliability Engineer ( Senior, Lead , Principal & Manager) Hybrid Location: Pune, Chennai, Bangalore & Gurgaon Need immediate joiners only Job description Role: Performance & Reliability Engineer Job Location: Gurgaon, Chennai, Pune, Bangalore Hybrid Job Overview: We are seeking a highly skilled and motivated Performance & Reliability...


  • Bengaluru, India QualityKiosk Technologies Pvt. Ltd. Full time

    Job Description QualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optimal performance and user experience. QualityKiosk, which offers automated quality assurance solutions for clients across geographies and verticals,...


  • Bengaluru, India Visa Full time

    Job Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • India Pythian Full time

    Job DescriptionSite Reliability EngineerHyderabad-based | Multiple timezones available | Hybrid | Work from Home and the OfficeAt Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the reliability and...


  • India Akamai Technologies Full time

    Job Description Job Description Do you have the passion to architect and lead the next generation of public cloud infrastructure Would you like to lead modernization initiatives while building a public cloud platform from scratch Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power...


  • Bengaluru, Karnataka, India ZEN Cloud Systems Private Limited Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Job Title: Site Reliability Engineer (SRE)Duration: 12 monthsLocation: BangaloreTimings: Full Time (As per company timings)Notice Period: (Immediate Joiner - Only)Experience: 7-8 YearsJob Description:We are seeking a skilled and proactive engineer with expertise in Kubernetes, Java-based applications, and cloud platforms (AWS/Azure/GCP), along with...


  • India Akamai Full time ₹ 8,00,000 - ₹ 25,00,000 per year

    Do you have the passion to architect and lead the next generation of public cloud infrastructure?Would you like to lead modernization initiatives while building a public cloud platform from scratch?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform....


  • India Akamai Full time

    Do you have the passion to architect and lead the next generation of public cloud infrastructure? Would you like to lead modernization initiatives while building a public cloud platform from scratch? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud...


  • Bhalki, India Kuzmik Michael D DDS Full time

    Job Description Sr. Azure Site Reliability Engineer Keep Planet-Scale Systems Reliable, Secure, and Fast (On-site only) Location: Ahmedabad, Bangalore/Bengaluru, Hyderabad, India (Onsite) Experience Level: 3-9 Years Employment Type: Full-time Location: Ahmedabad, Bangalore/Bengaluru, Hyderabad, India (Onsite) Experience Level: 3-9 Years Employment...


  • Bengaluru, Karnataka, India LanceSoft, Inc. Full time ₹ 6,00,000 - ₹ 8,00,000 per year

    Role DescriptionThis is a full-time on-site role for a Senior Site Reliability Engineer based in Bangalore/Chennai/Pune. The Senior Site Reliability Engineer will be responsible for maintaining and enhancing the reliability and performance of the company's IT infrastructure & Development. Daily tasks include troubleshooting system issues, ensuring system...