Senior Site Reliability Engineer
2 weeks ago
Senior Site Reliability Engineer (SRE) - Mission-Critical SaaS Cloud Products
Key Responsibilities
Reliability and Performance Management
- Design, implement, and maintain highly available, scalable, and resilient cloud-native architectures for mission-critical SaaS products.
- Develop and implement SLOs, SLIs, and SLAs to measure and improve service reliability.
- Continuously optimize system performance and resource utilization across multiple cloud platforms.
- Fine-tune and optimize application performance by analyzing code, traces, and database queries.
Incident Management and Troubleshooting
- Lead incident response efforts, effectively troubleshooting complex issues to minimize downtime and impact.
- Reduce Mean Time to Recover (MTTR) through proactive monitoring, automated alerting, and efficient problem-solving techniques.
- Conduct thorough Root Cause Analysis (RCA) for all major incidents and implement preventive measures.
Observability and Monitoring
- Design and implement end-to-end observability solutions across distributed systems.
- Develop and maintain comprehensive monitoring strategies using tools like ELK Stack, Prometheus, and Grafana.
- Create and optimize product status dashboards to provide real-time visibility into system health and performance.
Automation and Infrastructure as Code (IaC)
- Implement Infrastructure as Code practices using tools like Terraform.
- Develop and maintain automated deployment pipelines and CI/CD workflows.
- Create self-healing systems and automate routine operational tasks to reduce manual intervention.
Cloud-Agnostic Architecture
- Design and implement cloud-agnostic solutions that can operate efficiently across multiple cloud providers.
- Develop expertise in event-driven architectures and related technologies (e.g., Apache Kafka, EventHub, Redis, MongoDB Atlas, IoT Hub).
- Implement and manage containerized applications using Kubernetes across different cloud environments.
Continuous Improvement
- Regularly review and refine operational practices to enhance efficiency and reliability.
- Stay updated with the latest industry trends and technologies in SRE, cloud computing, and DevOps.
- Contribute to the development of internal tools and frameworks to support SRE practices.
Requirements
- Must live in Hyderabad
- Must be able to work on-site 2-3 days a week.
- Strong knowledge of cloud platforms such as Azure and their associated services.
- Expertise in observability tools (e.g., ELK Stack, Dynatrace, Prometheus).
- Proficiency in containerization technologies such as Docker and Kubernetes.
- Understanding of event-driven architecture and database technologies (e.g., MongoDB Atlas, Azure SQL, PostgreSQL).
- Proficient in IaC tools such as Terraform and GitHub Actions.
- Strong programming skills in one or more languages (e.g., Python, .NET, Java).
- Solid understanding of networking concepts, load balancing, and security best practices.
-
Senior Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India ValueLabs Full timeDear Aspirants, We at ValueLabs have an Opening for Senior Site Reliability Engineer role. Below is the JD for the same..Role : Senior Site Reliability EngineerOverall Experience: 7+ YearsPreferable Immediate-15 days JoinersKey Responsibilities:We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for ensuring the...
-
Senior Engineering Manager
3 days ago
Hyderabad, Telangana, India beBee Careers Full timeAs a Senior Engineering Manager Site Reliability Engineering (SRE), you will be responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of customers globally.This role requires expertise in leading cross-functional teams to develop and execute SRE strategies aligned with business...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Tata Consultancy Services Full timeJob Title: Site Reliability Engineer(SRE)(Hadoop Administration)Location: Hyderabad/Pune / IndoreExperience Range: 3 – 10 yearsJob DescriptionDesired Competencies (Technical/Behavioral Competency)Must-Have**About The Job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed,...
-
Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India Tata Consultancy Services Full timeJob Title: Site Reliability Engineer(SRE)(Hadoop Administration)Location: Hyderabad/Pune / IndoreExperience Range: 3 – 10 yearsJob DescriptionDesired Competencies (Technical/Behavioral Competency)Must-Have**About The Job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed,...
-
Site Reliability Engineer Position
7 hours ago
Hyderabad, Telangana, India beBee Careers Full timeWe are seeking a Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability of our critical services.Your key responsibilities will include:Providing end-to-end observability to enable teams to build, run, and own reliable services.Identifying areas for improvement and...
-
Senior Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Microsoft Full timeOverview Do you have a passion for high scale services and working with some of Microsoft's most critical cloud capabilities? We're looking for a Senior Site Relability Engineer with the right mix of software development, Cloud experience and passion for quality to envision, design, and deliver solutions for Microsoft's cloud Infrastructure. ...
-
Senior Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India STATS PERFORM Full timeJob DescriptionDAZN Group is looking for Senior Site Reliability Engineer to join our dynamic team and embark on a rewarding career journey1. Analyzing customer needs to determine appropriate solutions for complex technical issues2. Creating technical diagrams, flowcharts, formulas, and other written documentation to support projects3. Providing guidance to...
-
Senior Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Dazn Software Private Limited Full timeAbout the RoleAt DAZN, were redefining the sports broadcasting experience worldwide and reliability is at the heart of everything we do. Our Site Reliability Engineering (SRE) team, based across the UK and India, works tirelessly to ensure that our services are stable, scalable, and always available.As a Senior SRE, you will play a key role in improving the...
-
Senior Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Dazn Software Private Limited Full timeAbout the RoleAt DAZN, were redefining the sports broadcasting experience worldwide and reliability is at the heart of everything we do. Our Site Reliability Engineering (SRE) team, based across the UK and India, works tirelessly to ensure that our services are stable, scalable, and always available.As a Senior SRE, you will play a key role in improving the...
-
Senior Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India beBee Careers Full timeAbout the RoleAs a Senior Site Reliability Engineer, you will lead initiatives to automate infrastructure, optimize CI/CD pipelines, and enhance observability across globally scaled environments. Your work will empower development velocity while ensuring rock-solid reliability.Key Responsibilities:Design and manage scalable infrastructure on Kubernetes,...