Site reliability engineer
4 weeks ago
Job DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and operating highly reliable and scalable products.The SREs mission is to design, build, and operate highly reliable systems that support stable business growth. Specifically, SREs quantitatively measure and manage system reliability, achieving appropriate risk balance through SLI/SLOs. By automating operations to reduce human error, responding quickly to incidents, conducting root cause analysis, and driving continuous improvement, SREs enhance service resilience. Through these efforts, SREs cultivate a culture within the organization that blends engineering and operational best practices.Expected RoleIn this role, you will act as a leader who identifies technical challenges within development teams, proactively plans solutions, and drives projects to resolution. By closely collaborating with developers and platform engineers, you will promote continuous improvements, ensuring that products remain resilient, scalable, and aligned with business objectives.Key Responsibilities1. Service Reliability & Scalability Design, build, and maintain highly available and scalable production services Define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system reliability and performanceAnalyze and improve system bottlenecks and conduct capacity planning2. Incident ManagementLead incident response efforts to mitigate and resolve production issues quicklyConduct postmortems and root cause analyses to prevent recurrenceContinuously improve the incident management process and optimize on-call operations3. Automation & Operational EfficiencyAutomate operational tasks using Infrastructure as Code (Ia C) tools such as TerraformImplement self-healing and auto-scaling mechanisms for infrastructure componentsOptimize deployment pipelines and CI/CD workflows to improve release efficiency and rollback capabilities4. Observability & MonitoringDesign and implement comprehensive monitoring, logging, and tracing strategies using tools like Open Telemetry, Grafana, Prometheus, and DatadogOptimize alerting mechanisms to reduce noise and improve actionable insightsContinuously enhance system visibility and root cause analysis capabilities5. Leading the SRECollaborate with development teams to identify and resolve operational and reliability-related technical challengesDefine and execute reliability strategies as an SREsAct as a technical advisor on SRE methodologies within the organization6. Collaboration & Knowledge SharingWork closely with other SREs, platform engineers, and developers to optimize infrastructure and improve reliabilityEnabling developers capability of SRE practiceDevelop internal tools and best practices to enhance operational efficiencyRequirementsWe are looking for individuals who fulfill multiple of the following skills and qualifications:Few years of experience in Site Reliability Engineering, Dev Ops, or Infrastructure EngineeringSome coding experience is required (does not need to be web applications; experience with batch processing or small automation scripts only is acceptable)shell(e.g. bash) only experience is not acceptable. Experience with some statically typed(e.g. C, C++, Java, Rust, Go, Scala.. ) or dynamically typed(e.g. Perl, Ruby, Python, PHP, Java Script…) language is required.Experience collaborating with development teams to enhance system reliabilityTechnical leadership experience (mentoring and supporting team members in technical areas)Strong problem-solving skills and ability to take ownership of reliability-related challengesProven experience in project management (identifying issues, planning solutions, driving execution, and coordinating stakeholders)Multiple experiences in the following technical areas:Experience operating Kubernetes in a production environmentProficiency in Infrastructure as Code (Ia C) tools (e.g., Terraform, Crossplane)Experience with CI/CD automation tools (e.g., Argo CD, Circle CI, Git Hub Actions)Hands-on experience with observability tools (e.g., Prometheus, Open Telemetry, Grafana, Datadog)Familiarity with cloud platforms (AWS or others) and cloud-native architecturesExperience in incident management, disaster recovery, and high availability strategiesPreferred QualificationsExperience fostering SRE best practices within an organizationDeep understanding of microservices architecture and its operational challengesProficiency in programming languages such as Go, Python, or Bash for automation and tooling developmentContributions to CNCF projects or open-source communitiesWork EnvironmentOpportunity to lead define reliability strategies in a rapidly growing organizationCollaboration with global teams in an agile and technically driven environmentHands-on experience with large-scale distributed systems and cutting-edge cloud-native technologiesA culture that values automation, reliability, and continuous improvement.
-
Cloud Site Reliability Engineer
2 days ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
Site Reliability Engineer
6 days ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per yearHiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...
-
Site Reliability Engineer
4 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key Responsibilities- Design,...
-
Site Reliability Engineer
4 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key ResponsibilitiesDesign,...
-
Site Reliability Engineer
2 days ago
Chennai, Tamil Nadu, India NatWest Group Full timeSite Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...
-
Site Reliability Engineer
7 days ago
Bengaluru, Chennai, India Venpa Staffing Full time ₹ 15,00,000 - ₹ 25,00,000 per yearWe are looking for an experienced Site Reliability Engineer (SRE) with over 7 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the ability to ensure the reliability, scalability, and performance of large-scale systems.Key Skills & Experience:7+ years of experience in Site Reliability...
-
Site Reliability Engineer
3 hours ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
7 days ago
Chennai, Hyderabad, India Glomatriz Technologies Full time ₹ 2,50,000 - ₹ 7,50,000 per yearWe are seeking a Site Reliability Engineer with 3+ years of experience to ensure system reliability, monitor performance, and implement scalable solutions. The role involves automation, incident management, and collaboration with development teams.
-
Site Reliability Engineer
2 days ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per yearRole Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Senior Site Reliability Engineer
3 weeks ago
Chennai, India Saama Full timeDescription Job Title: Senior Site Reliability Engineer Job Summary: We are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and availability of our systems by leveraging your expertise in DevOps practices and tools....