Principal Site Reliability Engineer

3 weeks ago


Hyderabad, Telangana, India Splunk Inc Full time
About the Role

We are seeking a highly skilled Principal Site Reliability Engineer to join our team at Splunk Inc. As a key member of our SRE team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-native microservices platform.

Key Responsibilities
  • Set technical direction and lead large-scale technical initiatives across multiple teams.
  • Develop and implement new processes to improve team efficiency and effectiveness.
  • Collaborate with other team leaders to orchestrate large system changes.
  • Design and implement new services, tools, and monitoring to improve system reliability and performance.
  • Analyze tradeoffs and make recommendations based on design proposals.
  • Mentor new engineers to achieve more than they thought possible.
Work on Reliability Projects
  • High Availability, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO.
  • Chaos engineering.
  • Application uptime and performance.
  • Capacity management & planning.
  • SLIs, SLOs, error budgets, and monitoring dashboards.
  • Responsible for deployment and operations of large-scale distributed data stores and streaming services.
  • Establishing design patterns for monitoring and benchmarking.
  • Establishing and documenting production run books and guidelines for developers.
  • Tooling, toil reduction, runbooks & automation to handle production environments.
  • Incident management and improving MTTD/MTTR for services.
  • Cloud cost optimization.
Qualifications
  • 10+ years of SRE experience in handling large-scale cloud-native microservices platforms.
  • 4+ years of strong hands-on experience deploying, handling, and monitoring large-scale Kubernetes clusters in the public cloud specifically AWS or GCP.
  • Experience with infrastructure automation and scripting using Python and/or bash scripting.
  • Strong hands-on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. in order to build observability for large-scale microservices deployments.
  • Experience with deployment, operations and performance management of one or more of the following large-scale clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc.
  • Experience leading large-scale technical initiatives across multiple teams.
  • Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems.
Preferred Qualifications
  • AWS Solutions Architect certification preferred.
  • Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred.
  • Experience with Infrastructure-as-Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
  • Experience with CI/CD frameworks and Pipeline-as-Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc.
  • Experience with one or more security/compliance frameworks such as SOC2, PCI, and/or FedRAMP.
  • Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Education

Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.



  • Hyderabad, Telangana, India SID Global Solutions Full time

    Site Reliability EngineerAt SID Global Solutions, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based systems.Key Responsibilities:Design, implement, and maintain scalable and highly available cloud...


  • Hyderabad, Telangana, India Virtusa Full time

    Job Title: SRE Devops awsJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Virtusa. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key Responsibilities:Design and implement robust...


  • Hyderabad, Telangana, India SINGLE POINT TECHNOLOGIES PRIVATE LIMITED Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a skilled Site Reliability Engineer to join our team at Single Point Technologies Private Limited. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and security of our cloud-based product suite.Key Responsibilities:* Design and implement...


  • Hyderabad, Telangana, India Crox Consulting Inc Full time

    Site Reliability EngineerJob Summary:Crox Consulting Inc is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based SaaS environment.Key Responsibilities:Design and implement automation and software solutions...


  • Hyderabad, Telangana, India Tata Consultancy Services Full time

    Job Title: Site Reliability EngineerTata Consultancy Services is a global leader in the technology arena, and we're looking for a skilled Site Reliability Engineer to join our team.Key Responsibilities:Design, develop, and test Java applications using standard frameworks and tools.Analyze and resolve application issues in collaboration with team...


  • Hyderabad, Telangana, India SID Global Solutions Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at SID Global Solutions.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using GCP, AWS/Azure, and Kubernetes.Develop and maintain CI/CD pipelines using Jenkins, GitLab CI, and Docker.Collaborate with...


  • Hyderabad, Telangana, India RealPage, Inc. Full time

    Job SummaryRealPage, Inc. is seeking a highly skilled Site Reliability Engineer to join our SRE & Systems team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our multiple open-source application environments.Key ResponsibilitiesProvision, de-provision, and support multiple open-source application...


  • Hyderabad, Telangana, India Quest Diagnostics Full time

    Job Title: Site Reliability Engineering ManagerWe are seeking a highly skilled Site Reliability Engineering Manager to join our team at Quest Diagnostics. As a Site Reliability Engineering Manager, you will be responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining scalable and reliable systems.Key...


  • Hyderabad, Telangana, India Experian Full time

    Job Title: Site Reliability EngineerJob Summary:Experian is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our AWS platform.Key Responsibilities:Optimize microservice and serverless processes on robust distributed...


  • Hyderabad, Telangana, India Quest Diagnostics Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineering Manager to join our team at Quest Diagnostics. As a Site Reliability Engineering Manager, you will be responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining reliable and scalable systems.Key ResponsibilitiesLead and manage a team of Site...


  • Hyderabad, Telangana, India Zelis Full time

    Job Title: Site Reliability EngineerZelis is seeking a highly skilled Site Reliability Engineer to join our Engineering team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Gather and analyze metrics from operating systems and...


  • Hyderabad, Telangana, India Live Connections Full time

    We are looking for Manager Site Reliability Engineer in Hyderabad locationRoles and Responsibilities :Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, Telangana, India Quest Diagnostics Full time

    Job Title: Site Reliability Engineering ManagerQuest Diagnostics is seeking a highly skilled Site Reliability Engineering Manager to lead our team of engineers in delivering high-quality, reliable, and scalable systems.Key Responsibilities:Lead and manage a team of Site Reliability Engineers, providing mentorship, guidance, and support to ensure the team's...


  • Hyderabad, Telangana, India FactSet Full time

    Job Title: Lead Site Reliability EngineerAt FactSet, we're seeking a highly skilled Lead Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining highly available and scalable architectures for our applications and infrastructure.Key...


  • Hyderabad, Telangana, India FactSet Full time

    Job SummaryWe are seeking a skilled Site Reliability Engineer to join our team at FactSet. The ideal candidate will have a strong background in designing, implementing, and maintaining highly available and scalable architectures for our applications and infrastructure.Key ResponsibilitiesCollaborate with cross-functional teams to define, review, and...


  • Hyderabad, Telangana, India Virtusa Full time

    Job SummaryVirtusa is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key ResponsibilitiesDesign and implement robust monitoring and alerting systems to...


  • Hyderabad, Telangana, India Virtusa Full time

    Job SummaryVirtusa is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining reliable and scalable infrastructure solutions to support our applications and services.Key ResponsibilitiesDesign and implement robust monitoring and alerting systems to...


  • Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...


  • Hyderabad, Telangana, India RiskInsight Consulting Pvt Ltd Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at RiskInsight Consulting Pvt Ltd. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our banking applications and infrastructure.Key Responsibilities:Manage a 24/7 production support team in the Banking...


  • Hyderabad, Telangana, India Tata Consultancy Services Full time

    About the RoleTata Consultancy Services is a global leader in the technology arena, and we're looking for talented individuals to join our team. As a Site Reliability Engineer, you'll play a crucial role in ensuring the stability and performance of our applications.Key ResponsibilitiesDesign, develop, and test Java applications using standard frameworks and...