Reliability Engineer

1 month ago


Hyderabad, India Arcesium Full time

We are looking for an experienced Principal Engineer to implement a new monitoring tool for the firm. The ideal candidate will have a strong background in SRE principles and practices, and strong knowledge and experience in maintaining monitoring frameworks for large scale organizations. The Engineer will be responsible for the evaluation of monitoring tools, understand the scale of Arcesium, and propose a cost effective and reliable monitoring framework, also manage the system end to end. The SRE team is responsible for monitoring the stability and availability of mission critical production systems, managing incidents for quicker resolution, and establishing BAU. Team also building tools/infra which to be used by all development teams to assist in monitoring and troubleshooting.
This position is for HYD/BLR .

What You‘ll Do
Design, develop, and implement scalable and reliable monitoring solutions for distributed systems at scale.
Define and implement monitoring requirements in collaboration with cross-functional teams.
Lead the development of monitoring architectures and strategies.
Integrate monitoring tools into existing infrastructure.
Maintain and support monitoring systems.
Demonstrate strong technical breadth/depth, driving innovation, evaluating new technologies, and deciphering the technical vision for engineering teams.
Own key contributions to technical design and architecture decisions, considering trade-offs of choices, managing risk, making decisions independently where appropriate, and presenting reasoned options for decision making by others.
Lead the way by writing exemplary code, documentation, and RFCs.
Identify, propose, develop, deploy, and own R&D projects in accordance with the technical vision and needs of the team, turning problem statements into solutions, and operating independently as needed.

What You‘ll Need
10+ years of experience in SRE or a related field.
Proven experience in designing, developing, and implementing monitoring solutions.
Deep understanding of monitoring technologies and tools, including Prometheus, Grafana, Loki, and Tempo
Experience with cloud-based monitoring systems, such as New Relic, Datadog, and Grafana Cloud
Experience with log analysis tools, such as Splunk, Logstash, Fluent, and Sumo Logic
Experience with distributed tracing implementation using Open Telemetry, Jaeger
Strong understanding of SRE principles and practices.
Experience with incident response and management.
Reliability: An exposure to Chaos Engineering and various reliability practices including disaster recovery will be good to have.
Experience with Cloud Computing like AWS.
Experience with Kubernetes.
Experience in Agile practices (Scrum)
Excellent analytical, problem-solving, and troubleshooting skills.
Excellent communication and presentation skills.
Experience managing and mentoring engineers.
Ability to work independently and as part of a team.

The Company offers excellent benefits, an informal and collegial working environment, and an attractive compensation package.

Members of the Arcesium Company Group do not discriminate in employment matters on the basis of sex, race, color, caste, creed, religion, pregnancy, national origin, age, military service eligibility, veteran status, sexual orientation, marital status, disability, or any other protected class.


  • Cad Drafter

    3 weeks ago


    Hyderabad, India Pinnacle Reliability Full time

    We are building a team of trailblazers, who embody growth, impact, and excellence. **Job Description**: We are currently looking for a CAD DRAFTER to support engineering projects by utilizing skills in AutoCAD as well as the ability to draft in Isometric planes. Job Duties - Executes drafting work in AutoCAD to meet high quality standards and efficiency...

  • Cad Drafter

    1 week ago


    Hyderabad, Telangana, India Pinnacle Reliability Full time

    We are building a team of trailblazers, who embody growth, impact, and excellence.Job Description:We are currently looking for a CAD DRAFTER to support engineering projects by utilizing skills in AutoCAD as well as the ability to draft in Isometric planes.Job Duties Executes drafting work in AutoCAD to meet high quality standards and efficiency metrics...

  • Presales Executive

    2 months ago


    Hyderabad, India Select Engineer Full time

    Presales engineers or technical engineers are professionals who are technically skilled members of an IT sales team. They understand and draw up a customer requirement and recommend the right products or services to their customers. These professionals analyse the existing product or services and help a company develop solutions that resonate with the...


  • hyderabad, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If...


  • Hyderabad, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If so,...


  • hyderabad, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If...


  • Hyderabad, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If so,...


  • Hyderabad, Telangana, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If so,...


  • Hyderabad, India FedEx ACC Full time

    Skill Required: Under general supervision, assists in the development and design of deliverables that support the resolution of moderately complex problems and technical design gaps. Supports improvement initiatives that are aligned with overarching global reliability of the company‘s systems, including capacity planning, failover strategies, performance...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 3The Site Reliability Engineer (SRE) Level 3 will be responsible for maintaining and improving the reliability, availability, and performance of Google Anthos & Apigee systems. This is a Tech Lead role and will work closely with senior SREs, DevOps teams, and other stakeholders to ensure the...


  • Hyderabad, India Coforge Ltd. Full time

    Job Summary The Site Reliability Engineer (SRE) - responsible for maintaining and improving the reliability, availability, and performance of the applications. To create and implement robust, automated solutions for operational challenges. Optimizing system reliability and solving complex problems.Key Responsibilities• Reliability and Performance: Monitor...


  • Hyderabad, India Coforge Ltd. Full time

    Job Summary The Site Reliability Engineer (SRE) - responsible for maintaining and improving the reliability, availability, and performance of the applications. To create and implement robust, automated solutions for operational challenges. Optimizing system reliability and solving complex problems.Key Responsibilities• Reliability and Performance: Monitor...


  • Hyderabad, India Vistex Full time

    Vistex is currently hiring a Site Reliability Engineer. The Vistex Site Reliability Engineer will be primarily responsible for service availability, performance, monitoring, incident response, and capacity planning. This is a highly technical, hands-on role with a strong focus on automation, accurate monitoring, actionable alerting, resilient design,...


  • hyderabad, India Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps EngineerJob Description:Summary:As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 3 The Site Reliability Engineer (SRE) Level 3 will be responsible for maintaining and improving the reliability, availability, and performance of Google Anthos & Apigee systems. This is a Tech Lead role and will work closely with senior SREs, DevOps teams, and other stakeholders to ensure the...


  • Hyderabad, Telangana, India Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer Job Description: Summary: As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • hyderabad, India Virtusa Full time

    Site Reliability engineer - CREQ188641 Description Position : SRE Primary skills: devops CI/CD pipeline Location: Hyderabad Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to-end observability.Should have proficiency in industry standard monitoring...


  • Hyderabad, India Virtusa Full time

    Site Reliability engineer - CREQ188641 Description Position : SRE Primary skills: devops CI/CD pipeline Location: Hyderabad Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to-end observability. Should have proficiency in industry standard monitoring tools...


  • hyderabad, India Virtusa Full time

    Site Reliability engineer - CREQ188641 Description Position : SRE Primary skills: devops CI/CD pipeline Location: Hyderabad Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to-end observability.Should have proficiency in industry standard monitoring...


  • Hyderabad, India Virtusa Full time

    Site Reliability engineer - CREQ188641 Description Position : SRE Primary skills: devops CI/CD pipeline Location: Hyderabad Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to-end observability. Should have proficiency in industry standard monitoring tools...