Senior System Reliability Engineer
2 weeks ago
Job Summary : We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune. As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure. You will be responsible for proactively identifying and mitigating risks, automating operational tasks, and driving continuous improvement in our systems. You will collaborate closely with development, operations, and other engineering teams to build and maintain resilient and efficient systems that meet the needs of our growing business. Responsibilities : Reliability & Availability : - Design, implement, and maintain highly available, scalable, and resilient systems. - Proactively identify potential points of failure and implement strategies to prevent outages. - Develop and implement monitoring and alerting systems to ensure system health and performance. - Participate in incident management, root cause analysis, and post-mortem processes to prevent recurrence. - Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system reliability. Automation & Tooling : - Drive automation of repetitive operational tasks, including deployment, monitoring, scaling, and recovery processes. - Develop and maintain infrastructure-as-code (IaC) using tools like Terraform, CloudFormation, or similar. - Build and maintain CI/CD pipelines to ensure smooth and reliable software deployments. - Evaluate and implement new tools and technologies to improve system reliability and efficiency. Performance Engineering : - Conduct performance testing and analysis to identify bottlenecks and optimize system performance. - Collaborate with development teams to ensure applications are designed for performance and scalability. - Implement capacity planning strategies to ensure systems can handle future growth. Security & Compliance : - Integrate security best practices into system design and operations. - Ensure systems comply with relevant security and compliance standards. - Participate in security audits and vulnerability assessments. Collaboration & Communication : - Work closely with development teams throughout the software development lifecycle to ensure reliability is built in from the beginning. - Collaborate with operations teams to ensure smooth deployment and operation of systems. - Communicate effectively with technical and non-technical stakeholders regarding system status, incidents, and improvements. - Mentor junior SREs and contribute to the growth of the team. - Participate in on-call rotations to ensure system availability. Problem Solving & Troubleshooting : - Troubleshoot complex issues across the entire stack (application, infrastructure, network). - Develop and maintain comprehensive documentation for systems and processes. - Contribute to the development of runbooks and standard operating procedures (SOPs). Required Skills & Experience : - Bachelor's degree in Computer Science, Engineering, or a related field. - 5-7 years of experience in a System Reliability Engineering, DevOps, or similar role. - Strong understanding of Linux/Unix operating systems. - Proficiency in at least one scripting language (e.g, Python, Bash, Go). - Experience with cloud platforms (e.g, AWS, Azure, GCP) and their services. - Experience with containerization technologies like Docker and orchestration tools like Kubernetes. - Experience with infrastructure-as-code (IaC) tools like Terraform or CloudFormation. - Experience with CI/CD tools like Jenkins, GitLab CI, CircleCI, or similar. - Strong understanding of monitoring and logging tools (e.g, Prometheus, Grafana, ELK stack, Datadog). - Experience with database systems (SQL and NoSQL). - Solid understanding of networking concepts (TCP/IP, DNS, load balancing). - Excellent problem-solving and troubleshooting skills. - Strong communication and collaboration skills (ref:hirist.tech)
-
Senior System Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Fulcrum digital Full timeJob Title: Senior System Reliability EngineerWe are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure.The ideal candidate will be responsible for proactively identifying and mitigating risks,...
-
Senior System Reliability Engineer
2 weeks ago
Pune, India Fulcrum digital Full timeJob Summary :We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune. As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure. You will be responsible for proactively identifying and...
-
Senior System Reliability Engineer
2 weeks ago
Pune, India Fulcrum digital Full timeJob Summary :We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune. As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure. You will be responsible for proactively identifying and...
-
Senior System Reliability Engineer
1 week ago
Pune, Maharashtra, India Fulcrum digital Full timeJob Summary :We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune. As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure. You will be responsible for proactively identifying and...
-
Reliable Systems Engineer
2 weeks ago
Pune, Maharashtra, India beBee Careers Full time**Job Overview**We are seeking a highly skilled Senior Reliability Engineer to join our team. As a key member of our infrastructure group, you will be responsible for designing and implementing systems that ensure the reliability and efficiency of our services.**Responsibilities**- Design and implement systems focused on reliability, systems operations, and...
-
Senior System Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Fulcrum digital Full timeJob SummaryWe are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to contribute to the dynamic technology team at Fulcrum Digital in Pune.Key ResponsibilitiesEnsure the reliability, performance, and scalability of our production systems and infrastructure.Proactively identify and mitigate risks, automate operational tasks,...
-
Reliable Systems Engineer
1 day ago
Pune, Maharashtra, India beBee Careers Full timeSystem Reliability Engineer Position OverviewWe are seeking a highly skilled System Reliability Engineer with 7 years of experience to join our dynamic team. The ideal candidate will have extensive expertise in production support, Python/Shell scripting, Kubernetes, Docker, and SRE monitoring tools such as Datadog, Prometheus, and Dynatrace.This role focuses...
-
System Reliability Engineer
1 week ago
Pune, Maharashtra, India beBee Careers Full timeSystem Reliability EngineerWe are looking for a talented System Reliability Engineer to ensure the performance, scalability, and reliability of our medium to complex software applications and systems. This role involves designing, developing, executing, and analyzing performance tests, as well as implementing chaos engineering practices to improve system...
-
Reliable Systems Engineer
1 day ago
Pune, Maharashtra, India beBee Careers Full time**Job Title:** Site Reliability EngineerSynopsis:We are seeking a highly skilled and motivated Site Reliability Engineer to join our team. The successful candidate will be responsible for ensuring the reliability, scalability, and performance of our systems.
-
Reliable System Engineer
2 weeks ago
Pune, Maharashtra, India beBee Careers Full timeJob SummaryThis role is perfect for a skilled Senior Site Reliability Engineer who can ensure our platform's stability and responsiveness 24/7. As a key member of the CRE team, you will enhance current software solutions to make them more stable.