Architect - Site Reliability Engineering [T500-13216]

1 month ago


Hyderabad, India Inspire Full time

The Architect – Site Reliability Engineering provides technical leadership in support of Inspire’s initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspire’s cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design, mature production readiness reviews, enhance build/test/release automation, mature observability practices and approach, and enhance platform resiliency, scalability, and recovery capabilities. The successful candidate will be comfortable engaging a wide variety of technical partners and stakeholders, takes a data-driven and analytical approach to problem resolution and identifying areas of opportunity, is self-driven, and has a passion for continuous improvement.


Primary Responsibilities and Essential Functions:

  • Engage in and strengthen application and cloud services development lifecycle—from inception, design, deployment, operation, to refinement. Work closely with application and platform teams to ensure software releases are well designed, planned, implemented, released, and monitored.
  • Design, motivate, guide, and support the creation of software, systems, and processes to increase product reliability and organizational efficiency while optimizing resource use and cloud spend.
  • Champion and support reliability practices across the software development lifecycle through activities like architecture reviews, code reviews, creating platforms and frameworks, and capacity planning.
  • Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection, and chaos testing.
  • Mature SRE practices through activities such as establishing error budgets, providing guidance and refinement to SRE dashboards, and enhancing capabilities to proactively detect anomalies.
  • Provide design guidance and recommendations for platform improvements based on production incident analysis and root cause investigation outputs.
  • Improve service reliability through blameless post-incident reviews and use of code, automation, or AI to respond to or prevent future problem recurrence.
  • Recognize automation opportunities, provide design, and support implementation / development of tools to automate routine, time-consuming, or manual jobs and processes.
  • Periodically assess current SRE practices and tools and provide recommendations for enhancements and improvement
  • Train, guide, and mentor teammates on SRE practices and principles
  • Design and execute strategies that ensure the scalability and the elasticity of the infrastructure.
  • Code-level debugging on issues escalated to the team.


Minimum Experience:

  • Minimum 8 years of experience as platform architect with advanced knowledge in the following key areas: containers, deployment architecture, benchmarking, design, and network engineering.
  • Minimum 4 years of combined experience serving in either a DevOps, SRE, Systems, and/or software development role.
  • Hands-on experience in establishing and maturing SRE practices, program, and roadmap
  • Extensive experience with public cloud technologies and cloud-native architectures and solutions. (Azure highly preferred)
  • Experience with Infrastructure-as-Code (IAC), DevOps, and CI/CD practices and tool chains (Terraform, Gitlab, ArgoCD, Jenkins)
  • Experience with configuration management tools (Ansible, Chef, and Packer)
  • Experience with container technology and orchestration (Kubernetes, Docker)
  • Experience with Observability and Monitoring practices and tools (OpenTelemetry, New Relic, OpsRamp, Prometheus, Grafana, Elastic Stack, Splunk, DynaTrace)
  • Deep understanding of microservice architectures, application servers, network, and databases
  • Excellent understanding of scalability processes and techniques
  • Hands-on experience designing and administering high availability and high-performance environments, as well as managing large-scale deployments of traffic-heavy applications.
  • Ability to understand and support multiple, complex systems and not shy away from the challenge of improving them.
  • Comfortable with technical refactoring and creating technical designs to accommodate architectural evolution over time.
  • The willingness to try new technologies and make them harmonize with existing systems to achieve better operations overall.
  • Excellent communication and collaboration skills.


  • Hyderabad, India Inspire Full time

    The Architect – Site Reliability Engineering provides technical leadership in support of Inspire’s initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspire’s cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design,...


  • Hyderabad, India Inspire Full time

    The Architect – Site Reliability Engineering provides technical leadership in support of Inspire’s initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspire’s cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design,...


  • Hyderabad, India Inspire Full time

    The Architect – Site Reliability Engineering provides technical leadership in support of Inspire’s initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspire’s cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design,...


  • Hyderabad, Telangana, India Inspire Brands Hyderabad Support Center Full time

    The Architect Site Reliability Engineering provides technical leadership in support of Inspires initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspires cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design, mature...


  • Hyderabad, India Inspire Brands Hyderabad Support Center Full time

    The Architect Site Reliability Engineering provides technical leadership in support of Inspires initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspires cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design, mature...


  • Hyderabad, India Inspire Full time

    The Site Reliability Engineer will assist with the development and implementation of the cloud architecture in various cloud, hybrid, and on premise systems. This position will directly contribute to the overall implementation of enterprise cloud architecture while working closely with staff to enhance and develop new designs and strategies across all types...


  • Hyderabad, India Talent500 Full time

    A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and Cloud capabilities to ensure the scalability, performance, and reliability of large-scale, cloud-based applications. As applications and infrastructure became complex and cloud-based—a more proactive and software-centric approach is needed to ensure...


  • hyderabad, India Inspire Full time

    The Architect – Site Reliability Engineering provides technical leadership in support of Inspire’s initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspire’s cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design,...


  • hyderabad, India TEKsystems Global Services in India Full time

    Job descriptionAbout Customer / Business Problem The candidate needs to oversee multiple projects, provide technical guidance to team members and contribute towards Pre-SalesDuties and responsibilitiesSet direction for the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps etc.Define Strategy and...


  • hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India Inspire Full time

    The Site Reliability Engineer will assist with the development and implementation of the cloud architecture in various cloud, hybrid, and on premise systems. This position will directly contribute to the overall implementation of enterprise cloud architecture while working closely with staff to enhance and develop new designs and strategies across all types...


  • Hyderabad, India Inspire Full time

    The Site Reliability Engineer will assist with the development and implementation of the cloud architecture in various cloud, hybrid, and on premise systems. This position will directly contribute to the overall implementation of enterprise cloud architecture while working closely with staff to enhance and develop new designs and strategies across all types...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems.This entry-level role is ideal for someone who passionate about learning and developing their skills in system reliability, automation,...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India TEKsystems Global Services in India Full time

    Job description About Customer / Business Problem The candidate needs to oversee multiple projects, provide technical guidance to team members and contribute towards Pre-Sales Duties and responsibilities Set direction for the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps etc. ...


  • Hyderabad, India TEKsystems Global Services in India Full time

    Job descriptionAbout Customer / Business Problem The candidate needs to oversee multiple projects, provide technical guidance to team members and contribute towards Pre-SalesDuties and responsibilitiesSet direction for the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps etc.Define Strategy and...


  • Hyderabad, India TEKsystems Global Services in India Full time

    Job descriptionAbout Customer / Business Problem The candidate needs to oversee multiple projects, provide technical guidance to team members and contribute towards Pre-SalesDuties and responsibilitiesSet direction for the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps etc.Define Strategy and...


  • Hyderabad, India TEKsystems Global Services in India Full time

    Job description About Customer / Business Problem The candidate needs to oversee multiple projects, provide technical guidance to team members and contribute towards Pre-Sales Duties and responsibilities Set direction for the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, DevSecOps etc. Define...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...