Principal Site Reliability Engineer
3 weeks ago
We are seeking a highly skilled Principal Site Reliability Engineer to join our team at Splunk Inc. As a key member of our SRE team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-native microservices platform.
Key Responsibilities- Set technical direction and lead large-scale technical initiatives across multiple teams.
- Develop and implement new processes to improve team efficiency and effectiveness.
- Collaborate with other team leaders to orchestrate large system changes.
- Design and implement new services, tools, and monitoring to improve system reliability and performance.
- Analyze tradeoffs and make recommendations based on design proposals.
- Mentor new engineers to achieve more than they thought possible.
- High Availability, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO.
- Chaos engineering.
- Application uptime and performance.
- Capacity management & planning.
- SLIs, SLOs, error budgets, and monitoring dashboards.
- Responsible for deployment and operations of large-scale distributed data stores and streaming services.
- Establishing design patterns for monitoring and benchmarking.
- Establishing and documenting production run books and guidelines for developers.
- Tooling, toil reduction, runbooks & automation to handle production environments.
- Incident management and improving MTTD/MTTR for services.
- Cloud cost optimization.
- 10+ years of SRE experience in handling large-scale cloud-native microservices platforms.
- 4+ years of strong hands-on experience deploying, handling, and monitoring large-scale Kubernetes clusters in the public cloud specifically AWS or GCP.
- Experience with infrastructure automation and scripting using Python and/or bash scripting.
- Strong hands-on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. in order to build observability for large-scale microservices deployments.
- Experience with deployment, operations and performance management of one or more of the following large-scale clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc.
- Experience leading large-scale technical initiatives across multiple teams.
- Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems.
- AWS Solutions Architect certification preferred.
- Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred.
- Experience with Infrastructure-as-Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
- Experience with CI/CD frameworks and Pipeline-as-Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc.
- Experience with one or more security/compliance frameworks such as SOC2, PCI, and/or FedRAMP.
- Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India SID Global Solutions Full timeSite Reliability EngineerAt SID Global Solutions, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based systems.Key Responsibilities:Design, implement, and maintain scalable and highly available cloud...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Virtusa Full timeJob Title: SRE Devops awsJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Virtusa. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key Responsibilities:Design and implement robust...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India SINGLE POINT TECHNOLOGIES PRIVATE LIMITED Full timeJob Title: Site Reliability EngineerAbout the Role:We are seeking a skilled Site Reliability Engineer to join our team at Single Point Technologies Private Limited. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and security of our cloud-based product suite.Key Responsibilities:* Design and implement...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Crox Consulting Inc Full timeSite Reliability EngineerJob Summary:Crox Consulting Inc is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based SaaS environment.Key Responsibilities:Design and implement automation and software solutions...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Tata Consultancy Services Full timeJob Title: Site Reliability EngineerTata Consultancy Services is a global leader in the technology arena, and we're looking for a skilled Site Reliability Engineer to join our team.Key Responsibilities:Design, develop, and test Java applications using standard frameworks and tools.Analyze and resolve application issues in collaboration with team...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India SID Global Solutions Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at SID Global Solutions.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using GCP, AWS/Azure, and Kubernetes.Develop and maintain CI/CD pipelines using Jenkins, GitLab CI, and Docker.Collaborate with...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India RealPage, Inc. Full timeJob SummaryRealPage, Inc. is seeking a highly skilled Site Reliability Engineer to join our SRE & Systems team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our multiple open-source application environments.Key ResponsibilitiesProvision, de-provision, and support multiple open-source application...
-
Site Reliability Engineering Manager
3 weeks ago
Hyderabad, Telangana, India Quest Diagnostics Full timeJob Title: Site Reliability Engineering ManagerWe are seeking a highly skilled Site Reliability Engineering Manager to join our team at Quest Diagnostics. As a Site Reliability Engineering Manager, you will be responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining scalable and reliable systems.Key...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Experian Full timeJob Title: Site Reliability EngineerJob Summary:Experian is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our AWS platform.Key Responsibilities:Optimize microservice and serverless processes on robust distributed...
-
Site Reliability Engineering Manager
3 weeks ago
Hyderabad, Telangana, India Quest Diagnostics Full timeJob SummaryWe are seeking a highly skilled Site Reliability Engineering Manager to join our team at Quest Diagnostics. As a Site Reliability Engineering Manager, you will be responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining reliable and scalable systems.Key ResponsibilitiesLead and manage a team of Site...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Zelis Full timeJob Title: Site Reliability EngineerZelis is seeking a highly skilled Site Reliability Engineer to join our Engineering team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Gather and analyze metrics from operating systems and...
-
Manager - Site Reliability Engineering
3 weeks ago
Hyderabad, Telangana, India Live Connections Full timeWe are looking for Manager Site Reliability Engineer in Hyderabad locationRoles and Responsibilities :Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...
-
Site Reliability Engineering Manager
3 weeks ago
Hyderabad, Telangana, India Quest Diagnostics Full timeJob Title: Site Reliability Engineering ManagerQuest Diagnostics is seeking a highly skilled Site Reliability Engineering Manager to lead our team of engineers in delivering high-quality, reliable, and scalable systems.Key Responsibilities:Lead and manage a team of Site Reliability Engineers, providing mentorship, guidance, and support to ensure the team's...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India FactSet Full timeJob Title: Lead Site Reliability EngineerAt FactSet, we're seeking a highly skilled Lead Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining highly available and scalable architectures for our applications and infrastructure.Key...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India FactSet Full timeJob SummaryWe are seeking a skilled Site Reliability Engineer to join our team at FactSet. The ideal candidate will have a strong background in designing, implementing, and maintaining highly available and scalable architectures for our applications and infrastructure.Key ResponsibilitiesCollaborate with cross-functional teams to define, review, and...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Virtusa Full timeJob SummaryVirtusa is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key ResponsibilitiesDesign and implement robust monitoring and alerting systems to...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Virtusa Full timeJob SummaryVirtusa is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key ResponsibilitiesDesign and implement robust monitoring and alerting systems to...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Tata Consultancy Services Full timeAbout the RoleTata Consultancy Services is a global leader in the technology arena, and we're looking for talented individuals to join our team. As a Site Reliability Engineer, you'll play a crucial role in ensuring the stability and performance of our applications.Key ResponsibilitiesDesign, develop, and test Java applications using standard frameworks and...