Systems Reliability Specialist

5 days ago


Hyderabad Secunderabad Telangana, India beBeeInfrastructure Full time
Reliability Engineer Position

We are seeking a dedicated and skilled Operations Engineer to join our team. This role is pivotal in ensuring the reliability, performance, and availability of our systems while facilitating smooth integration and delivery processes.

The ideal candidate will have a strong background in site reliability engineering (SRE) and DevOps practices. You will collaborate with product owners, developers, architects, vendors, and other professionals to monitor, operate, support, audit and improve our digital solutions, their related processes, and controls.

You will demonstrate and maintain high standards while fostering a proactive, efficient, and service-oriented work environment. Communication and professionalism are paramount as you will be representing our team to effectively engage with technical and business leadership as well as external providers of digital services.

Operational Quality & Compliance:

  • Ensure high standards of operational quality across all systems.
  • Review and update procedures to ensure compliance with audit controls, support internal and external audits of the development and operation of the platform.

Metrics and Monitoring:

  • Develop and maintain comprehensive monitoring solutions to track system performance health, and reliability including alerts and dashboards.

Incident Response:

  • Provide first-level support for production incidents, ensuring quick resolution and minimal downtime.
  • Identify problems, escalate and support their resolution.

Reliability Improvements:

  • Implement strategies to enhance system reliability and performance.
  • Identify, analyze, and resolve patterns in operational issues, implementing solutions to prevent recurrence.

Required Skills and Qualifications

Technical Skills

  • Proficiency in monitoring tools (e.g., AppInsights, Grafana)
  • Experience with cloud platforms (e.g. Azure, GCP)
  • Strong scripting and automation skills (e.g., Powershell, Python)
  • Familiarity with incident management processes
  • Understanding of containerization technologies (e.g., Kubernetes)
  • Troubleshooting of complex distributed environments

Collaboration Skills

  • Work closely with product and project teams to integrate reliability best practices.
  • Collaborate to streamline development and operational processes, enhancing overall efficiency.

Education/Certifications

Preferred: Bachelor's degree in Computer Science, Software Engineering, Information Systems, equivalent work history/experience or working towards achieving a degree

Strong focus on systems engineering, reliability, and performance.

Experience in development operations, automation, and troubleshooting.

Experience

  • Strong knowledge of IT infrastructure services required
  • 5+ years - IaC Technologies leveraging Terraform (e.g. ADO, Pipelines, Git, YAML)
  • 5+ years - Orchestration and containerization using Kubernetes
  • 5+ years - API Integration of infrastructure systems such as Azure, ServiceNow, Active Directory
  • 4+ years - Azure Public Cloud Solutions
  • Experience with high availability, globally delivered, solutions and strong troubleshooting skills.
  • Familiarity with incident management processes.
  • Microsoft Cloud Infrastructure Certification, SRE Certification
  • Proficient in scripting and automation, with a solid understanding of infrastructure as code practices.

Leadership/Soft Skills

  • Strong Verbal and Written Communication: Candidates must demonstrate exceptional verbal and written communication skills to effectively convey information and collaborate with team members.
  • Effective Communicator: The ideal candidate will be an effective communicator who can articulate ideas clearly and concisely to diverse audiences.
  • Adaptable Communication Style: We value candidates who can adjust their communication style based on the audience and context, ensuring clarity and understanding.

Benefits

  • Competitive benefits and compensation package for all our people.
  • Flexibility in your schedule, empowering you to balance life's demands, while also maintaining your ability to serve clients.


  • Hyderabad, Telangana, India beBeeReliability Full time ₹ 30,00,000 - ₹ 40,00,000

    We are seeking a skilled System Reliability Specialist to join our team. As a System Reliability Specialist, you will play a critical role in ensuring the performance and reliability of our systems.- Design and implement Service Level Agreements (SLAs), Service Level Indicators (SLIs), and error budgets to improve system reliability.- Monitor and optimize...


  • Hyderabad / Secunderabad, Telangana, India beBeeSiteReliability Full time US$ 1,20,000 - US$ 2,00,000

    Job TitleWe are looking for an experienced Site Reliability Engineer to join our team.About the RoleThe role of a Site Reliability Engineer is to bridge the gap between development and operations. The ideal candidate will have a strong understanding of system reliability, automation, and incident response.This position requires someone who can analyze system...


  • Hyderabad, Telangana, India beBeeOperations Full time ₹ 1,80,00,000 - ₹ 2,00,00,000

    Site Reliability EngineerWe are looking for a skilled Systems Operations Specialist with extensive experience, responsible for ensuring the reliability, availability, and performance of critical systems.Key Responsibilities:Implement scalable, secure services in cloud environments (AWS) adhering to SRE principles.Develop and manage Continuous...


  • Hyderabad / Secunderabad, Telangana, India beBeeReliability Full time US$ 1,04,000 - US$ 1,30,878

    System Reliability Engineer OpportunityWe are seeking an experienced System Reliability Engineer to join our organization in India. The ideal candidate will have a strong background in ensuring the reliability, scalability, and performance of our services.This role requires a mix of technical expertise, leadership skills, and a passion for operational...


  • Hyderabad / Secunderabad, Telangana, India beBeeReliabilityEngineer Full time

    Job Overview">This is a key role in our organization, responsible for ensuring the reliability and efficiency of our systems.">Key Responsibilities:">">Develop and implement monitoring systems to ensure high system uptime.">Lead incident response and root cause analysis to minimize downtime.">Automate repetitive tasks to increase productivity and...


  • Hyderabad, Telangana, India beBeeSoftwareEngineer Full time ₹ 1,80,00,000 - ₹ 2,40,00,000

    Reliable System SpecialistOur organization seeks a highly skilled specialist to enhance system reliability and performance. This key role will be responsible for designing, implementing, and maintaining scalable infrastructure to support applications and services.The ideal candidate will have expertise in software engineering concepts and applied experience...


  • Hyderabad / Secunderabad, Telangana, India beBeeReliability Full time ₹ 15,00,000 - ₹ 20,00,000

    Role Summary:We are seeking a highly skilled Site Reliability Engineer (SRE) to drive the development and maintenance of our production systems.The ideal candidate will have 5-9 years of experience in managing production systems, ensuring reliability and performance while collaborating with cross-functional teams to drive software engineering best...


  • Hyderabad / Secunderabad, Telangana, Chennai, India beBeeReliability Full time ₹ 18,00,000 - ₹ 25,00,000

    SRE Architect RoleWe are seeking a highly skilled SRE Architect to join our team.The ideal candidate will have experience designing and implementing reliable systems at scale, with a strong understanding of software engineering, system architecture, and operations.Key Responsibilities:System Design and Architecture: Lead the design and architecture of...


  • Hyderabad / Secunderabad, Telangana, India beBeeReliability Full time

    Job Title: Sr. Specialist DDIT APD ERP PlatformTechEnable Digital Transformation and Drive Business GrowthWe are seeking a highly skilled Sr. Specialist to join our team and enable the adoption and maturity of DevOps & SRE culture, leading to better service level objectives.This role involves troubleshooting high priority incidents impacting the...


  • Hyderabad, Telangana, India beBeeAzure Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    System Reliability Engineer (SRE) - Azure SpecialistThis role is for a skilled System Reliability Engineer with expertise in Core Azure Services, IoT, Event Hub, Databricks, and experience with Kubernetes, Docker, and Python/Powershell scripting.The ideal candidate will have strong knowledge of monitoring tools, including ELK, alerting, and logging systems....