Reliability Engineering Specialist

1 week ago


Hyderabad, Telangana, India FedEx ACC Full time

About FedEx ACC India:

As a strategic technology division, we develop innovative solutions for customers and team members worldwide. Our goal is to enhance productivity, minimize expenses, and update our technology infrastructure to deliver exceptional customer experiences.

A Site Reliability Engineer (SRE) combines software engineering and Cloud capabilities to ensure scalability, performance, and reliability of large-scale applications.

In today's complex cloud-based environment, a proactive and software-centric approach is necessary to guarantee reliability at scale. By combining software engineering and cloud principles, SREs bring a mindset of automation and reliability to operations.

The preferred approach to tackle operations challenges with a software engineering perspective involves leveraging:

  • Coding
  • Automation
  • Engineering principles
  • This enables the creation of resilient, self-healing systems that can scale seamlessly.

An SRE bridges the gap between traditional software engineering and operations to create highly scalable and fault-tolerant systems. As a result, they ensure the reliable and efficient operation of an organization's systems and services.

Key Responsibilities:

  • Ensure system reliability and availability

We strive for efficient systems that are the backbone of every secure and breach-free organization. Organizations continuously update their application to provide advanced features to users. However, sometimes their systems become unreliable, resulting in unavailability. This is where site reliability engineers help.

SREs ensure systems are reliable by:

  • Monitoring system issues
  • Creating strategies to detect issues
  • Addressing those issues
  • Designing systems to troubleshoot automatically
  • Writing and reviewing post-mortems

Mitigating operational risks is also crucial. SREs identify, assess, and implement measures to eliminate potential risks that could impact the performance of systems and services.

To mitigate operational risks, SREs collaborate with development teams and other stakeholders to identify potential risks. Once risks are identified, they analyze and evaluate potential impact and likelihood of occurrence. Based on the risk assessment, they implement various risk mitigation strategies to mitigate operational risks.

Once done, they continuously monitor and review the effectiveness of their risk strategies. By doing so, SREs maintain system reliability and ensure a positive user experience.

Monitoring system health is essential. An SRE uses alerts, tickets, logging mechanisms, and request times to monitor a system's health. This ensures the system is stable and minimizes user disruption. In case a bug occurs, respond immediately to resolve it.

Automating this process eliminates manual collection, storage, and visualization of data. SREs study historical trends in terms of performance by using metrics like charts and graphs. They then trace problems with system monitoring tools and manage infrastructures at scale.

Minimizing emergency response time is vital. The Mean Time to Respond (MTTR) measures the time an SRE takes to fix the incident after it happens. Minimizing MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving incidents quickly.

Maintaining internal tooling is also crucial. Site reliability engineers maintain internal tools to run complex operations smoothly. These tools help them track severe bugs, maintain CI/CD pipelines, and communicate with other teams.

Some widely used internal tools include communication platforms like MS Teams and ServiceNow – ePDSM, bug tracking platforms such as JIRA and Digital Agility or HP ALM, deployment strategies like GitHub Actions, monitoring solutions like Splunk and Grafana, error logging services like Kibana and ELK Stack, documentation tools like MS SharePoint, and continuous improvement through collaboration with QA, software engineers, and security engineers.

Qualifications:

  • Bachelor's degree in computer science, engineering, or a related field
  • 3 to 5 years of experience as an SRE or DevOps engineer or Ops Engineer

Estimated salary: $120,000 - $180,000 per annum



  • Hyderabad, Telangana, India Talent500 Full time

    About the RoleWe are seeking an experienced Cloud Reliability Engineering Specialist to join our team at FedEx ACC. As a Cloud Reliability Engineer, you will play a critical role in ensuring the scalability, performance, and reliability of our cloud-based applications.


  • Hyderabad, Telangana, India Oracle Full time

    Job DescriptionWe are seeking an experienced Reliability Engineering Specialist to join our team at Oracle.About the RoleThis is a key position that will play a crucial role in defining and developing software for tasks associated with the development, design, and debugging of software applications or operating systems.You will be responsible for managing...


  • Hyderabad, Telangana, India FedEx ACC Full time

    About FedEx ACC">We are a leading company in the logistics industry, known for our reliability and efficiency.">Salary Range">$120,000 - $180,000 per year">Job Description">A Cloud Systems Reliability Specialist is responsible for ensuring the scalability, performance, and reliability of large-scale cloud-based applications. They combine software engineering...


  • Hyderabad, Telangana, India GMR Group Full time

    Job OverviewThe GMR Group is seeking a highly skilled Maintenance Reliability Specialist to join our team. This key role will be responsible for developing and implementing maintenance programs to improve equipment reliability and minimize downtime.


  • Hyderabad, Telangana, India F5 Full time

    F5 is a leading provider of digital transformation solutions. Our teams empower organizations to create, secure, and run applications that enhance the digital experience.We are passionate about cybersecurity, from protecting consumers to enabling companies to focus on innovation.Our culture centers around people, prioritizing diversity and individual...


  • Hyderabad, Telangana, India Tata Consultancy Services Full time

    Are you passionate about data recovery and eager to work with a leading company in the industry? Tata Consultancy Services is currently seeking a skilled Reliable Data Recovery Specialist to join our team.Job Overview:We are looking for an experienced professional who can provide reliable data recovery services for our clients. As a Reliable Data Recovery...


  • Hyderabad, Telangana, India Tanla Platforms Limited Full time

    About the RoleAs a Site Reliability Engineer, you will play a pivotal role in ensuring the availability, scalability, and reliability of our platforms and applications. Your expertise will be instrumental in maintaining optimal system uptime and preventing performance issues.Key Responsibilities:Build and Maintain Scalable Deployments: Design, implement, and...


  • Hyderabad, Telangana, India Thomson Reuters Full time

    About the RoleIn this opportunity as Systems Reliability Engineer, you will:Work with application teams to manage and support applications in production environments.Develop and maintain a continuous improvement strategy for on-going support models, including release and change management for maintaining strategic environments (production, non-production,...


  • Hyderabad, Telangana, India Arcesium Full time

    Company OverviewArcesium is a global financial technology firm that solves data-driven challenges faced by sophisticated financial institutions. Our platform and capabilities continuously innovate to meet tomorrow's challenges, anticipate risks, and design advanced solutions for transformational business outcomes.We value intellectual curiosity, proactive...


  • Hyderabad, Telangana, India Live Connections Full time

    We are looking for a highly skilled Site Reliability Engineering Lead to join our team at Live Connections in Hyderabad. As a key member of our organization, you will be responsible for leading and managing a team of engineers to ensure the reliability, scalability, and performance of our systems.**Estimated Salary: ₹25,00,000 - ₹35,00,000 per...


  • Hyderabad, Telangana, India Truetech Full time

    Job Summary:As a Cloud Reliability Engineer at TrueTech, you will lead and manage a team of Site Reliability Engineers, providing mentorship, guidance, and support to ensure the team's success. You will also develop and implement strategies for improving system reliability, scalability, and performance.Establish and enforce SRE best practices, including...


  • Hyderabad, Telangana, India Ideagen Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team as a Monitoring & Observability Lead. As a key member of our infrastructure monitoring team, you will play a critical role in ensuring the optimal performance and reliability of our SaaS infrastructure across a multi-cloud environment.As a Monitoring &...


  • Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...


  • Hyderabad, Telangana, India SID Global Solutions Full time

    At SID Global Solutions, we are seeking a highly motivated and detail-oriented Site Reliability Engineer to join our team. As an ideal candidate, you will have a strong passion for system reliability, automation, and incident response.About the Role:This is an entry-level position that offers a unique opportunity for professional growth and development. You...

  • Reliability Engineer

    4 weeks ago


    Hyderabad, Telangana, India Unison Consulting Pte Ltd Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Unison Consulting Pte Ltd. As a key member of our infrastructure team, you will be responsible for ensuring the high availability and performance of our applications.About the RoleThe ideal candidate will have a minimum of 5-7 years' experience as a Site Reliability...


  • Hyderabad, Telangana, India Ideagen Full time

    About UsIdeagen is a global leader in software solutions, empowering organizations to achieve their safety and quality goals. Our innovative products and services help ensure the reliability and performance of mission-critical systems.As a Monitoring and Observability Lead, you will play a critical role in shaping our SaaS infrastructure to meet the evolving...


  • Hyderabad, Telangana, India Tech Mahindra Full time

    Job Title: Cloud Data Engineer SpecialistWe are seeking an experienced Cloud Data Engineer Specialist to join our team at Tech Mahindra. This role involves designing, building, and maintaining large-scale data processing systems on cloud platforms like Azure.About the Role:As a Cloud Data Engineer Specialist, you will be responsible for developing and...


  • Hyderabad, Telangana, India Live Connections Full time

    About Live ConnectionsWe're a cutting-edge technology firm dedicated to delivering innovative solutions. Our team is passionate about crafting exceptional products that drive business success.Job Description:System Reliability Engineer ManagerThis role offers an exciting opportunity to lead our site reliability engineering team, driving strategies for...


  • Hyderabad, Telangana, India Live Connections Full time

    We are seeking an experienced Site Reliability Engineering Team Lead to join our team at Live Connections in Hyderabad.About the RoleThis is a leadership position that requires a strong technical background in site reliability engineering and experience in managing teams. The ideal candidate will have a proven track record of driving projects to successful...


  • Hyderabad, Telangana, India Capgemini Engineering Full time

    Job Title: Embedded Linux Kernel/ Device Drivers SpecialistAbout the Role:We are seeking an experienced Embedded Linux Kernel/Device Drivers specialist to join our team at Capgemini Engineering in Hyderabad. The successful candidate will be responsible for developing and porting embedded software on Linux and ARM platforms.Responsibilities:Develop and...