Senior System Reliability Engineer
3 weeks ago
Job Summary :
We are seeking a highly motivated and experienced Senior System Reliability Engineer (SRE) to join our dynamic technology team in Pune.
As a Senior SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure.
You will be responsible for proactively identifying and mitigating risks, automating operational tasks, and driving continuous improvement in our systems.
You will collaborate closely with development, operations, and other engineering teams to build and maintain resilient and efficient systems that meet the needs of our growing business.
Responsibilities :
Reliability & Availability :
- Design, implement, and maintain highly available, scalable, and resilient systems.
- Proactively identify potential points of failure and implement strategies to prevent outages.
- Develop and implement monitoring and alerting systems to ensure system health and performance.
- Participate in incident management, root cause analysis, and post-mortem processes to prevent recurrence.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system reliability.
Automation & Tooling :
- Drive automation of repetitive operational tasks, including deployment, monitoring, scaling, and recovery processes.
- Develop and maintain infrastructure-as-code (IaC) using tools like Terraform, CloudFormation, or similar.
- Build and maintain CI/CD pipelines to ensure smooth and reliable software deployments.
- Evaluate and implement new tools and technologies to improve system reliability and efficiency.
Performance Engineering :
- Conduct performance testing and analysis to identify bottlenecks and optimize system performance.
- Collaborate with development teams to ensure applications are designed for performance and scalability.
- Implement capacity planning strategies to ensure systems can handle future growth.
Security & Compliance :
- Integrate security best practices into system design and operations.
- Ensure systems comply with relevant security and compliance standards.
- Participate in security audits and vulnerability assessments.
Collaboration & Communication :
- Work closely with development teams throughout the software development lifecycle to ensure reliability is built in from the beginning.
- Collaborate with operations teams to ensure smooth deployment and operation of systems.
- Communicate effectively with technical and non-technical stakeholders regarding system status, incidents, and improvements.
- Mentor junior SREs and contribute to the growth of the team.
- Participate in on-call rotations to ensure system availability.
Problem Solving & Troubleshooting :
- Troubleshoot complex issues across the entire stack (application, infrastructure, network).
- Develop and maintain comprehensive documentation for systems and processes.
- Contribute to the development of runbooks and standard operating procedures (SOPs).
Required Skills & Experience :
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5-7 years of experience in a System Reliability Engineering, DevOps, or similar role.
- Strong understanding of Linux/Unix operating systems.
- Proficiency in at least one scripting language (e.g, Python, Bash, Go).
- Experience with cloud platforms (e.g, AWS, Azure, GCP) and their services.
- Experience with containerization technologies like Docker and orchestration tools like Kubernetes.
- Experience with infrastructure-as-code (IaC) tools like Terraform or CloudFormation.
- Experience with CI/CD tools like Jenkins, GitLab CI, CircleCI, or similar.
- Strong understanding of monitoring and logging tools (e.g, Prometheus, Grafana, ELK stack, Datadog).
- Experience with database systems (SQL and NoSQL).
- Solid understanding of networking concepts (TCP/IP, DNS, load balancing).
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills
-
Reliable System Engineer
2 weeks ago
Pune, Maharashtra, India beBee Careers Full timeJob DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our team. This role will be worked on a hybrid basis.Responsibilities:Collaborate with cross-functional teams to ensure system reliability, availability, and performance.Implement and maintain automation tools for monitoring, deploying, and scaling applications.Troubleshoot...
-
Senior Reliability Engineer
14 hours ago
Pune, Maharashtra, India beBee Careers Full time**Job Description:**We have an immediate opportunity for a Senior Site Reliability Engineer to join our team. As a key member of our engineering group, you will play a critical role in ensuring the reliability and performance of our systems.Responsibilities:Design and implement scalable and reliable systems architectureDevelop and maintain monitoring and...
-
System Reliability Engineer Leader
6 days ago
Pune, Maharashtra, India beBee Careers Full timeJob DescriptionWe are seeking a highly skilled and experienced Senior System Reliability Engineer to join our dynamic technology team.About the Role:The Senior System Reliability Engineer will play a critical role in ensuring the reliability, performance, and scalability of our production systems and infrastructure.Main Responsibilities:Design, implement,...
-
Reliability Engineer for Scalable Systems
4 days ago
Pune, Maharashtra, India beBee Careers Full timeWe're looking for a highly experienced SRE (Senior Site Reliability Engineer) to lead our team in implementing reliable and scalable systems.The ideal candidate will have hands-on experience working on RFP/proposals, excellent communication and business presentation skills, and must-have skills in system reliability, chaos engineering, SLI/SLO/SLA concepts,...
-
Reliable Systems Engineer
6 days ago
Pune, Maharashtra, India beBee Careers Full timeJob Description:An exciting opportunity exists for a Senior DevSecOps Engineer to join our team, defining CI/CD pipelines and monitoring applications. The ideal candidate will solve technical challenges by designing, deploying, and troubleshooting containerized platforms and infrastructure, ensuring reliability, scalability, resilience, security, and...
-
Reliable Software Systems Engineer Wanted
5 days ago
Pune, Maharashtra, India beBee Careers Full timeAbout the Job:We are looking for an experienced Senior Site Reliability Engineer to join our team and help us ensure the reliability and performance of our software systems.Responsibilities:Designing and implementing system reliability and chaos engineering practicesCollaborating with cross-functional teams to develop and implement high availability...
-
System Reliability Engineer
6 days ago
Pune, Maharashtra, India beBee Careers Full timeJob Title: System Reliability EngineerJob Description:We are seeking a skilled System Reliability Engineer to join our team. As a key contributor to our efforts in building and maintaining highly reliable and resilient systems, you will be responsible for proactively identifying weaknesses in our systems, developing and executing chaos experiments, and...
-
Reliability Engineer
1 week ago
Pune, Maharashtra, India beBee Careers Full timeJob DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our team. This is an exciting opportunity for a seasoned professional who can drive performance and reliability in our systems.About the RoleThe Senior Site Reliability Engineer will be responsible for ensuring the high availability and scalability of our applications and...
-
Technical Manager for System Reliability
5 days ago
Pune, Maharashtra, India beBee Careers Full timeJob SummarySenior Systems Engineer - Team LeadWe are seeking an experienced senior systems engineer to join our team as a team lead. The successful candidate will have a strong background in systems engineering and software engineering, with a focus on designing and implementing reliable and scalable systems.The ideal candidate will have a deep understanding...
-
Senior System Reliability Engineer
3 weeks ago
Pune, Maharashtra, India Fulcrum Digital Full timeAbout Fulcrum DigitalFulcrum Digital is a leading platform and digital solutions engineering company, founded in 1999, with offices in the US, Europe, LATAM, and India. We specialize in technology consulting, enterprise application development, platform integration, Generative AI solutions, and full-scale implementation across various industries.Role...