Site Reliability Engineering Professional

7 days ago


Bengaluru, Karnataka, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

Platform Stability and Reliability Lead

  • Ensure the platform meets performance, availability, and reliability service level agreements.
  • Proactively identify and resolve performance bottlenecks and risks in production environments through root cause analysis and corrective actions.
  • Maintain and improve monitoring, logging, and alerting frameworks to detect and prevent incidents by leveraging data analytics and visualization tools.

Incident Management Expert

  • Act as the primary responder for critical incidents, ensuring rapid mitigation and resolution through effective communication with stakeholders.
  • Conduct thorough post-incident reviews and implement corrective actions to prevent recurrence by identifying and addressing underlying causes.
  • Develop and maintain detailed runbooks and playbooks for operational excellence, including standard operating procedures and escalation processes.

Automation and Efficiency Champion

  • Build and maintain tools to automate routine tasks, such as deployments, scaling, and failover, using technologies like Ansible and Terraform.
  • Contribute to CI/CD pipeline improvements for faster and more reliable software delivery by implementing automated testing, deployment, and rollback processes.
  • Write and maintain Infrastructure as Code (IaC) using tools like Pulumi or Terraform to provision and manage resources efficiently.

Collaboration and Mentorship Specialist

  • Collaborate with cross-functional teams, including SRE, CI/CD, Developer Experience, and Templates teams, to improve the platform's reliability and usability by sharing knowledge and best practices.
  • Mentor junior engineers by providing guidance, feedback, and coaching on SRE and operational excellence principles.
  • Partner with developers to integrate observability and reliability into their applications by promoting a culture of collaboration and continuous improvement.

Observability and Metrics Analyst

  • Implement and optimize observability tools like Prometheus, Grafana, or New Relic for deep visibility into system performance and behavior.
  • Define key metrics and dashboards to track the health and reliability of platform components, including error rates, latency, and throughput.
  • Continuously analyze operational data to identify and prioritize areas for improvement by leveraging data science and machine learning techniques.

Requirements and Qualifications

  • 8+ years of experience in site reliability engineering, software engineering, or a related field, with 3+ years of experience in AWS.
  • Demonstrated expertise in managing and optimizing cloud-based environments, including containerization and orchestration technologies like Kubernetes and Docker.
  • Strong programming skills in one or more languages: Python, Java, Node.js, or TypeScript.
  • Hands-on experience with CI/CD practices and tools, such as GitLab, Jenkins, or similar.
  • Familiarity with monitoring, logging, and alerting tools; experience with Dynatrace is a plus.

Preferred Skills and Qualifications

  • Hands-on experience with Kubernetes (K8s) for container orchestration and deployment.
  • Familiarity with monitoring and observability tools like Prometheus, Grafana, or similar.
  • Exposure to agile development practices and collaborative environments.
  • Experience working with other cloud platforms (e.g., Azure or Google Cloud) is a plus.
],

  • Bengaluru, Karnataka, India TRUGlobal Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Job Title: Site Reliability Engineer (SRE) with Python Development ExpertisePosition Overview: We are seeking a skilled Site Reliability Engineer (SRE) with strong Python development experience to join our team. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our services across both on-premises and...


  • Bengaluru, Karnataka, India Creencia Technologies Pvt Ltd Full time

    We are recruiting an experienced Site Reliability Engineer to join our newly established TechOps division within the Technology department. We maintain the systems that keep our products running smoothly around the world, 24x7 - supporting everything from cloud infrastructure and CI/CD pipelines to observability and incident response.How you will contribute...


  • Bengaluru, Karnataka, India Enterprise Minds, Inc Full time

    We're Hiring | Site Reliability Engineer | 8-10 years


  • Bengaluru, Karnataka, India NatWest Group Full time

    Join us as a Site Reliability Engineer In this key role you ll support the improvement of non-functional and operational characteristics such as availability performance efficiency change management monitoring security incident response and capacity planning of our products and services You ll enjoy significant stakeholder interaction working in...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...


  • Bengaluru, Karnataka, India Randstad Full time

    Role: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India Xebia Full time

    We are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...


  • Bengaluru, Karnataka, India Kyndryl Full time

    Who We AreAt Kyndryl we design build manage and modernize the mission-critical technology systems that the world depends on every day So why work at Kyndryl We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable inclusive world for our employees our customers and our communities The RoleJoin us as...


  • Bengaluru, Karnataka, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Role OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving innovation and modernizing complex systems by leveraging cutting-edge technologies and collaboration with cross-functional teams.