Reliability Engineering Specialist
1 week ago
About FedEx ACC India:
As a strategic technology division, we develop innovative solutions for customers and team members worldwide. Our goal is to enhance productivity, minimize expenses, and update our technology infrastructure to deliver exceptional customer experiences.
A Site Reliability Engineer (SRE) combines software engineering and Cloud capabilities to ensure scalability, performance, and reliability of large-scale applications.
In today's complex cloud-based environment, a proactive and software-centric approach is necessary to guarantee reliability at scale. By combining software engineering and cloud principles, SREs bring a mindset of automation and reliability to operations.
The preferred approach to tackle operations challenges with a software engineering perspective involves leveraging:
- Coding
- Automation
- Engineering principles
- This enables the creation of resilient, self-healing systems that can scale seamlessly.
An SRE bridges the gap between traditional software engineering and operations to create highly scalable and fault-tolerant systems. As a result, they ensure the reliable and efficient operation of an organization's systems and services.
Key Responsibilities:
- Ensure system reliability and availability
We strive for efficient systems that are the backbone of every secure and breach-free organization. Organizations continuously update their application to provide advanced features to users. However, sometimes their systems become unreliable, resulting in unavailability. This is where site reliability engineers help.
SREs ensure systems are reliable by:
- Monitoring system issues
- Creating strategies to detect issues
- Addressing those issues
- Designing systems to troubleshoot automatically
- Writing and reviewing post-mortems
Mitigating operational risks is also crucial. SREs identify, assess, and implement measures to eliminate potential risks that could impact the performance of systems and services.
To mitigate operational risks, SREs collaborate with development teams and other stakeholders to identify potential risks. Once risks are identified, they analyze and evaluate potential impact and likelihood of occurrence. Based on the risk assessment, they implement various risk mitigation strategies to mitigate operational risks.
Once done, they continuously monitor and review the effectiveness of their risk strategies. By doing so, SREs maintain system reliability and ensure a positive user experience.
Monitoring system health is essential. An SRE uses alerts, tickets, logging mechanisms, and request times to monitor a system's health. This ensures the system is stable and minimizes user disruption. In case a bug occurs, respond immediately to resolve it.
Automating this process eliminates manual collection, storage, and visualization of data. SREs study historical trends in terms of performance by using metrics like charts and graphs. They then trace problems with system monitoring tools and manage infrastructures at scale.
Minimizing emergency response time is vital. The Mean Time to Respond (MTTR) measures the time an SRE takes to fix the incident after it happens. Minimizing MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving incidents quickly.
Maintaining internal tooling is also crucial. Site reliability engineers maintain internal tools to run complex operations smoothly. These tools help them track severe bugs, maintain CI/CD pipelines, and communicate with other teams.
Some widely used internal tools include communication platforms like MS Teams and ServiceNow – ePDSM, bug tracking platforms such as JIRA and Digital Agility or HP ALM, deployment strategies like GitHub Actions, monitoring solutions like Splunk and Grafana, error logging services like Kibana and ELK Stack, documentation tools like MS SharePoint, and continuous improvement through collaboration with QA, software engineers, and security engineers.
Qualifications:
- Bachelor's degree in computer science, engineering, or a related field
- 3 to 5 years of experience as an SRE or DevOps engineer or Ops Engineer
Estimated salary: $120,000 - $180,000 per annum
-
Cloud Reliability Engineering Specialist
1 week ago
Hyderabad, Telangana, India Talent500 Full timeAbout the RoleWe are seeking an experienced Cloud Reliability Engineering Specialist to join our team at FedEx ACC. As a Cloud Reliability Engineer, you will play a critical role in ensuring the scalability, performance, and reliability of our cloud-based applications.
-
Reliability Engineering Specialist
3 weeks ago
Hyderabad, Telangana, India Oracle Full timeJob DescriptionWe are seeking an experienced Reliability Engineering Specialist to join our team at Oracle.About the RoleThis is a key position that will play a crucial role in defining and developing software for tasks associated with the development, design, and debugging of software applications or operating systems.You will be responsible for managing...
-
Cloud Systems Reliability Specialist
2 weeks ago
Hyderabad, Telangana, India FedEx ACC Full timeAbout FedEx ACC">We are a leading company in the logistics industry, known for our reliability and efficiency.">Salary Range">$120,000 - $180,000 per year">Job Description">A Cloud Systems Reliability Specialist is responsible for ensuring the scalability, performance, and reliability of large-scale cloud-based applications. They combine software engineering...
-
Maintenance Reliability Specialist
4 weeks ago
Hyderabad, Telangana, India GMR Group Full timeJob OverviewThe GMR Group is seeking a highly skilled Maintenance Reliability Specialist to join our team. This key role will be responsible for developing and implementing maintenance programs to improve equipment reliability and minimize downtime.
-
Reliability Engineering Specialist
3 weeks ago
Hyderabad, Telangana, India F5 Full timeF5 is a leading provider of digital transformation solutions. Our teams empower organizations to create, secure, and run applications that enhance the digital experience.We are passionate about cybersecurity, from protecting consumers to enabling companies to focus on innovation.Our culture centers around people, prioritizing diversity and individual...
-
Reliable Data Recovery Specialist
6 days ago
Hyderabad, Telangana, India Tata Consultancy Services Full timeAre you passionate about data recovery and eager to work with a leading company in the industry? Tata Consultancy Services is currently seeking a skilled Reliable Data Recovery Specialist to join our team.Job Overview:We are looking for an experienced professional who can provide reliable data recovery services for our clients. As a Reliable Data Recovery...
-
Reliability Engineer
1 week ago
Hyderabad, Telangana, India Tanla Platforms Limited Full timeAbout the RoleAs a Site Reliability Engineer, you will play a pivotal role in ensuring the availability, scalability, and reliability of our platforms and applications. Your expertise will be instrumental in maintaining optimal system uptime and preventing performance issues.Key Responsibilities:Build and Maintain Scalable Deployments: Design, implement, and...
-
Reliability Engineering Specialist
2 weeks ago
Hyderabad, Telangana, India Thomson Reuters Full timeAbout the RoleIn this opportunity as Systems Reliability Engineer, you will:Work with application teams to manage and support applications in production environments.Develop and maintain a continuous improvement strategy for on-going support models, including release and change management for maintaining strategic environments (production, non-production,...
-
Senior Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India Arcesium Full timeCompany OverviewArcesium is a global financial technology firm that solves data-driven challenges faced by sophisticated financial institutions. Our platform and capabilities continuously innovate to meet tomorrow's challenges, anticipate risks, and design advanced solutions for transformational business outcomes.We value intellectual curiosity, proactive...
-
Site Reliability Engineering Lead
6 days ago
Hyderabad, Telangana, India Live Connections Full timeWe are looking for a highly skilled Site Reliability Engineering Lead to join our team at Live Connections in Hyderabad. As a key member of our organization, you will be responsible for leading and managing a team of engineers to ensure the reliability, scalability, and performance of our systems.**Estimated Salary: ₹25,00,000 - ₹35,00,000 per...
-
TrueTech - Cloud Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India Truetech Full timeJob Summary:As a Cloud Reliability Engineer at TrueTech, you will lead and manage a team of Site Reliability Engineers, providing mentorship, guidance, and support to ensure the team's success. You will also develop and implement strategies for improving system reliability, scalability, and performance.Establish and enforce SRE best practices, including...
-
Site Reliability Engineering Team Lead
3 weeks ago
Hyderabad, Telangana, India Ideagen Full timeAbout the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team as a Monitoring & Observability Lead. As a key member of our infrastructure monitoring team, you will play a critical role in ensuring the optimal performance and reliability of our SaaS infrastructure across a multi-cloud environment.As a Monitoring &...
-
Site Reliability Engineer
2 months ago
Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...
-
Site Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India SID Global Solutions Full timeAt SID Global Solutions, we are seeking a highly motivated and detail-oriented Site Reliability Engineer to join our team. As an ideal candidate, you will have a strong passion for system reliability, automation, and incident response.About the Role:This is an entry-level position that offers a unique opportunity for professional growth and development. You...
-
Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India Unison Consulting Pte Ltd Full timeJob DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Unison Consulting Pte Ltd. As a key member of our infrastructure team, you will be responsible for ensuring the high availability and performance of our applications.About the RoleThe ideal candidate will have a minimum of 5-7 years' experience as a Site Reliability...
-
Site Reliability Engineer
1 month ago
Hyderabad, Telangana, India Ideagen Full timeAbout UsIdeagen is a global leader in software solutions, empowering organizations to achieve their safety and quality goals. Our innovative products and services help ensure the reliability and performance of mission-critical systems.As a Monitoring and Observability Lead, you will play a critical role in shaping our SaaS infrastructure to meet the evolving...
-
Cloud Data Engineer Specialist
6 days ago
Hyderabad, Telangana, India Tech Mahindra Full timeJob Title: Cloud Data Engineer SpecialistWe are seeking an experienced Cloud Data Engineer Specialist to join our team at Tech Mahindra. This role involves designing, building, and maintaining large-scale data processing systems on cloud platforms like Azure.About the Role:As a Cloud Data Engineer Specialist, you will be responsible for developing and...
-
Chief Site Reliability Engineering Lead
1 week ago
Hyderabad, Telangana, India Live Connections Full timeAbout Live ConnectionsWe're a cutting-edge technology firm dedicated to delivering innovative solutions. Our team is passionate about crafting exceptional products that drive business success.Job Description:System Reliability Engineer ManagerThis role offers an exciting opportunity to lead our site reliability engineering team, driving strategies for...
-
Site Reliability Engineering Team Lead
6 days ago
Hyderabad, Telangana, India Live Connections Full timeWe are seeking an experienced Site Reliability Engineering Team Lead to join our team at Live Connections in Hyderabad.About the RoleThis is a leadership position that requires a strong technical background in site reliability engineering and experience in managing teams. The ideal candidate will have a proven track record of driving projects to successful...
-
Hyderabad, Telangana, India Capgemini Engineering Full timeJob Title: Embedded Linux Kernel/ Device Drivers SpecialistAbout the Role:We are seeking an experienced Embedded Linux Kernel/Device Drivers specialist to join our team at Capgemini Engineering in Hyderabad. The successful candidate will be responsible for developing and porting embedded software on Linux and ARM platforms.Responsibilities:Develop and...