
Site Reliability Engineering Professional
7 days ago
Platform Stability and Reliability Lead
- Ensure the platform meets performance, availability, and reliability service level agreements.
- Proactively identify and resolve performance bottlenecks and risks in production environments through root cause analysis and corrective actions.
- Maintain and improve monitoring, logging, and alerting frameworks to detect and prevent incidents by leveraging data analytics and visualization tools.
Incident Management Expert
- Act as the primary responder for critical incidents, ensuring rapid mitigation and resolution through effective communication with stakeholders.
- Conduct thorough post-incident reviews and implement corrective actions to prevent recurrence by identifying and addressing underlying causes.
- Develop and maintain detailed runbooks and playbooks for operational excellence, including standard operating procedures and escalation processes.
Automation and Efficiency Champion
- Build and maintain tools to automate routine tasks, such as deployments, scaling, and failover, using technologies like Ansible and Terraform.
- Contribute to CI/CD pipeline improvements for faster and more reliable software delivery by implementing automated testing, deployment, and rollback processes.
- Write and maintain Infrastructure as Code (IaC) using tools like Pulumi or Terraform to provision and manage resources efficiently.
Collaboration and Mentorship Specialist
- Collaborate with cross-functional teams, including SRE, CI/CD, Developer Experience, and Templates teams, to improve the platform's reliability and usability by sharing knowledge and best practices.
- Mentor junior engineers by providing guidance, feedback, and coaching on SRE and operational excellence principles.
- Partner with developers to integrate observability and reliability into their applications by promoting a culture of collaboration and continuous improvement.
Observability and Metrics Analyst
- Implement and optimize observability tools like Prometheus, Grafana, or New Relic for deep visibility into system performance and behavior.
- Define key metrics and dashboards to track the health and reliability of platform components, including error rates, latency, and throughput.
- Continuously analyze operational data to identify and prioritize areas for improvement by leveraging data science and machine learning techniques.
Requirements and Qualifications
- 8+ years of experience in site reliability engineering, software engineering, or a related field, with 3+ years of experience in AWS.
- Demonstrated expertise in managing and optimizing cloud-based environments, including containerization and orchestration technologies like Kubernetes and Docker.
- Strong programming skills in one or more languages: Python, Java, Node.js, or TypeScript.
- Hands-on experience with CI/CD practices and tools, such as GitLab, Jenkins, or similar.
- Familiarity with monitoring, logging, and alerting tools; experience with Dynatrace is a plus.
Preferred Skills and Qualifications
- Hands-on experience with Kubernetes (K8s) for container orchestration and deployment.
- Familiarity with monitoring and observability tools like Prometheus, Grafana, or similar.
- Exposure to agile development practices and collaborative environments.
- Experience working with other cloud platforms (e.g., Azure or Google Cloud) is a plus.
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India TRUGlobal Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJob Title: Site Reliability Engineer (SRE) with Python Development ExpertisePosition Overview: We are seeking a skilled Site Reliability Engineer (SRE) with strong Python development experience to join our team. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our services across both on-premises and...
-
Site Reliability Engineer
3 days ago
Bengaluru, Karnataka, India Creencia Technologies Pvt Ltd Full timeWe are recruiting an experienced Site Reliability Engineer to join our newly established TechOps division within the Technology department. We maintain the systems that keep our products running smoothly around the world, 24x7 - supporting everything from cloud infrastructure and CI/CD pipelines to observability and incident response.How you will contribute...
-
Site Reliability Engineer
7 days ago
Bengaluru, Karnataka, India Enterprise Minds, Inc Full timeWe're Hiring | Site Reliability Engineer | 8-10 years
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India NatWest Group Full timeJoin us as a Site Reliability Engineer In this key role you ll support the improvement of non-functional and operational characteristics such as availability performance efficiency change management monitoring security incident response and capacity planning of our products and services You ll enjoy significant stakeholder interaction working in...
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...
-
site reliability engineer
5 days ago
Bengaluru, Karnataka, India Randstad Full timeRole: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Site Reliability Engineer
7 days ago
Bengaluru, Karnataka, India Xebia Full timeWe are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...
-
Site Reliability Engineer
1 day ago
Bengaluru, Karnataka, India Kyndryl Full timeWho We AreAt Kyndryl we design build manage and modernize the mission-critical technology systems that the world depends on every day So why work at Kyndryl We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable inclusive world for our employees our customers and our communities The RoleJoin us as...
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Role OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving innovation and modernizing complex systems by leveraging cutting-edge technologies and collaboration with cross-functional teams.