Senior System Reliability Engineer

5 days ago


Pune, Maharashtra, India Fulcrum digital Full time

Job Summary :

We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune.

As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure.

You will be responsible for proactively identifying and mitigating risks, automating operational tasks, and driving continuous improvement in our systems.

You will collaborate closely with development, operations, and other engineering teams to build and maintain resilient and efficient systems that meet the needs of our growing business.

Responsibilities :

Reliability & Availability :

- Design, implement, and maintain highly available, scalable, and resilient systems.

- Proactively identify potential points of failure and implement strategies to prevent outages.

- Develop and implement monitoring and alerting systems to ensure system health and performance.

- Participate in incident management, root cause analysis, and post-mortem processes to prevent recurrence.

- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system reliability.

Automation & Tooling :

- Drive automation of repetitive operational tasks, including deployment, monitoring, scaling, and recovery processes.

- Develop and maintain infrastructure-as-code (IaC) using tools like Terraform, CloudFormation, or similar.

- Build and maintain CI/CD pipelines to ensure smooth and reliable software deployments.

- Evaluate and implement new tools and technologies to improve system reliability and efficiency.

Performance Engineering :

- Conduct performance testing and analysis to identify bottlenecks and optimize system performance.

- Collaborate with development teams to ensure applications are designed for performance and scalability.

- Implement capacity planning strategies to ensure systems can handle future growth.

Security & Compliance :

- Integrate security best practices into system design and operations.

- Ensure systems comply with relevant security and compliance standards.

- Participate in security audits and vulnerability assessments.

Collaboration & Communication :

- Work closely with development teams throughout the software development lifecycle to ensure reliability is built in from the beginning.

- Collaborate with operations teams to ensure smooth deployment and operation of systems.

- Communicate effectively with technical and non-technical stakeholders regarding system status, incidents, and improvements.

- Mentor junior SREs and contribute to the growth of the team.

- Participate in on-call rotations to ensure system availability.

Problem Solving & Troubleshooting :

- Troubleshoot complex issues across the entire stack (application, infrastructure, network).

- Develop and maintain comprehensive documentation for systems and processes.

- Contribute to the development of runbooks and standard operating procedures (SOPs).

Required Skills & Experience :

- Bachelor's degree in Computer Science, Engineering, or a related field.

- 5-7 years of experience in a System Reliability Engineering, DevOps, or similar role.

- Strong understanding of Linux/Unix operating systems.

- Proficiency in at least one scripting language (e.g, Python, Bash, Go).

- Experience with cloud platforms (e.g, AWS, Azure, GCP) and their services.

- Experience with containerization technologies like Docker and orchestration tools like Kubernetes.

- Experience with infrastructure-as-code (IaC) tools like Terraform or CloudFormation.

- Experience with CI/CD tools like Jenkins, GitLab CI, CircleCI, or similar.

- Strong understanding of monitoring and logging tools (e.g, Prometheus, Grafana, ELK stack, Datadog).

- Experience with database systems (SQL and NoSQL).

- Solid understanding of networking concepts (TCP/IP, DNS, load balancing).

- Excellent problem-solving and troubleshooting skills.

- Strong communication and collaboration skills

(ref:hirist.tech)

  • Pune, Maharashtra, India beBee Careers Full time

    **Job Overview**We are seeking a highly skilled Senior Reliability Engineer to join our team. As a key member of our infrastructure group, you will be responsible for designing and implementing systems that ensure the reliability and efficiency of our services.**Responsibilities**- Design and implement systems focused on reliability, systems operations, and...


  • Pune, Maharashtra, India Fulcrum digital Full time

    Job SummaryWe are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to contribute to the dynamic technology team at Fulcrum Digital in Pune.Key ResponsibilitiesEnsure the reliability, performance, and scalability of our production systems and infrastructure.Proactively identify and mitigate risks, automate operational tasks,...


  • Pune, Maharashtra, India beBee Careers Full time

    We are seeking an experienced Senior Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability and performance of our applications and infrastructure.The ideal candidate will have a strong understanding of distributed systems, cloud platforms (AWS, Azure or GCP), and microservices architecture. You will...


  • Pune, Maharashtra, India beBee Careers Full time

    Senior Site Reliability Engineer Opportunity">We are seeking a highly skilled and experienced Senior Site Reliability Engineer to play a critical role in ensuring the reliability, scalability, and performance of systems and applications. As an SRE, you will be responsible for monitoring and observability using tools like Splunk, AppD, Prometheus, Fluentd,...


  • Pune, Maharashtra, India beBee Careers Full time

    System Reliability EngineerWe are looking for a talented System Reliability Engineer to ensure the performance, scalability, and reliability of our medium to complex software applications and systems. This role involves designing, developing, executing, and analyzing performance tests, as well as implementing chaos engineering practices to improve system...


  • Pune, Maharashtra, India beBee Careers Full time

    Job SummaryThis role is perfect for a skilled Senior Site Reliability Engineer who can ensure our platform's stability and responsiveness 24/7. As a key member of the CRE team, you will enhance current software solutions to make them more stable.


  • Pune, Maharashtra, India beBee Careers Full time

    Job Description:We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability and performance of our systems.About the Role:This is an excellent opportunity for a seasoned SRE to take on a leadership role and drive the implementation of high availability architectures,...


  • Pune, Maharashtra, India beBee Careers Full time

    Senior Site Reliability EngineerWe are seeking an experienced Sr. SRE to join our team. As a key member of our infrastructure group, you will be responsible for ensuring the reliability and scalability of our systems.In this role, you will work closely with cross-functional teams to design, implement, and operate scalable and highly available systems. You...


  • Pune, Maharashtra, India beBee Careers Full time

    Job Description:">">We have an immediate opportunity for a seasoned Site Reliability Engineer with 5 to 9 years of experience. The ideal candidate will be responsible for ensuring the reliability and performance of our applications and infrastructure.">">About the Role:">">This is a customer-facing role that requires strong communication and business...


  • Pune, Maharashtra, India beBee Careers Full time

    Responsibilities :- Design and implement chaos engineering experiments to identify weaknesses in systems and applications.- Develop and execute strategies to improve system resilience and reliability.- Analyze experiment results, provide actionable insights, and drive remediation efforts.- Collaborate with cross-functional teams to integrate chaos...