Site Reliability Engineering Analyst

4 days ago


hyderabad, India FedEx Full time

A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and Cloud capabilities to ensure the scalability, performance, and reliability of large-scale, cloud-based applications.

As applications andinfrastructurebecame complex and cloud-baseda more proactive and software-centric approach is needed to ensure reliability at scale.

By combining software engineering and cloud principles, SREs bring a mindset of automation, reliability to operations. The preferred approach to tackle operations challenges with a software engineering perspective, leveraging:

  • Coding
  • Automation
  • Engineering principles
  • By doing so, build resilient, self-healing systems that could scale seamlessly.

So how do we do this? Heres what we expect SRE to help IT and Engineering team to mature:

  • Detect issues.
  • Automatically handle failures.
  • Preparedisaster recovery plans.
  • Keep the system up and reliable.
  • Mitigate broken systems and prevent them from causing future disruptions.

Responsibilities :

  • An SRE bridges the gap between traditional software engineering and operations to create highly scalable and fault-tolerant systems. As a result, ensure the reliable and efficient operation of an organization's systems and services.

Heres an in-depth look into the core responsibilities of site reliability engineers:

Ensure system reliability and availability:

  • Efficient systems are the backbone of every secure and breach-free organization. Organizations continuously update their application to provide advanced features to users.
  • But sometimes, their systems become unreliable, which results in unavailability. This is where site reliability engineers help.

Here's how SRE ensure systems are reliable:

  • Monitor system issues.
  • Create strategies to detect issues.
  • Address those issues.
  • Design systems to troubleshoot automatically.
  • Write and review post-mortems.

Mitigate operational risks:

  • SREs identify, assess, and implement measures to eliminate potential risks that could impact the performance of systems and services.

Here is how SRE do it:

  • Collaborate with development teams and other stakeholders to identify potential risks.
  • Once risks are identified, analyze and evaluate potential impact and likelihood of occurrence.
  • Based on the risk assessment, implement various risk mitigation strategies to mitigate operational risks.
  • Once done, continuously monitor and review the effectiveness of their risk strategies.
  • By doing so, SREs maintain system reliability and ensure a positive user experience.

Monitor system health:

  • Monitoring means measuring systems health. An SRE uses alerts, tickets, logging mechanisms, and request times to monitor a systems health. This ensures the system is stable and minimizes user disruption. In case a bug occurs, respond immediately to resolve it.

However, doing all of this manually is expensive and time-consuming. So, SREs automate this process for systems that handle large amounts of data. Here is how they do it:

  • Study historical trends in terms of performance by using metrics like charts and graphs.
  • Next, they trace the problems with system monitoring tools.
  • Monitor the log files to manage infrastructures at scale.
  • Doing so eliminates manual collection, storage, and visualization of the data.

Minimize emergency response:

  • Emergency response is the time site reliability engineers take to respond to problems. This period is known as the Mean Time to Respond (MTTR). It measures the time an SRE takes to fix the incident after it happens.
  • Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.

Maintain internal tooling:

  • Site reliability engineers maintain internal tools to run complex operations smoothly. These tools help them track severe bugs, maintain CI/CD pipelines, and communicate with other teams.

Some of the most widely used internal tools are:

  • Communication platforms like MS teams, ServiceNow ePDSM.
  • Bug tracking platforms such as JIRA, Digital Agility or HP ALM.
  • Deployment strategies such as GitHub Actions
  • Monitoring solutions like Splunk, Grafana.
  • Error logging services such as Kibana, ELK Stack.
  • Documentation tools such as MS SharePoint.
  • Continuous Improvement.
  • Site reliability engineers aim to make systems better every day. For this purpose, collaborate with teams like QA, software engineers, and security engineers to ensure all teams are on the same page.

Qualifications:

  • Bachelors degree in computer science, Engineering, or related field.
  • 3 to 5 years of experience as an SRE or DevOps engineer or Ops Engineer.


  • hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems.This entry-level role is ideal for someone who passionate about learning and developing their skills in system reliability, automation,...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India TechBlocks Full time

    Seeking a skilled Senior Site Reliability Engineer with expertise in Google Cloud Platform (GCP) to join our dynamic team. As a Senior SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure and applications hosted on GCP.Responsibilities:Design, build, and maintain the core infrastructure used by all...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1 Experience: 2.5 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-OnPeople Leader ResponsibilityPosition will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system reliability, automation,...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system reliability, automation,...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1 The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system reliability,...


  • Hyderabad, Telangana, India FedEx Full time

    A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and Cloud capabilities to ensure the scalability, performance, and reliability of large-scale, cloud-based applications.As applications andinfrastructurebecame complex and cloud-baseda more proactive and software-centric approach is needed to ensure reliability at...


  • Hyderabad, India FedEx Full time

    A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and Cloud capabilities to ensure the scalability, performance, and reliability of large-scale, cloud-based applications. As applications andinfrastructurebecame complex and cloud-baseda more proactive and software-centric approach is needed to ensure reliability...


  • hyderabad, India Anicalls (Pty) Ltd Full time

    The Role Mentor teammates on SRE best practices and guide technical direction Work closely with the product engineering team to rapidly deliver capabilities Automate and optimize developer pipelines Build monitoring to assess system and pipeline health Qualifications: Proficiency in Python, Go, Ruby, or Java is a plus Expertise in Linux administration,...


  • hyderabad, India ValueLabs Full time

    Experienced in SRE or Site Reliability EngineerDesign, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps.Collaborate with cross-functional teams to optimize system performance, reliability, and scalability.Develop and maintain tools for continuous integration, continuous deployment (CI/CD), and...


  • hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-On People Leader Responsibility Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-OnPeople Leader ResponsibilityPosition will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-OnPeople Leader ResponsibilityPosition will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-OnPeople Leader ResponsibilityPosition will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-On People Leader Responsibility Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...