Principal Site Reliability Engineer

1 month ago


Chennai, India Encora Inc. Full time

Important Information

Experience: 6 to 8 years

Job Location: Chennai

Position Type: Full time.

Work Mode- Hybrid (3 days in office)

 Principal Site Reliability Engineer

About the Opportunity:  

The Principal Site Reliability Engineer is vital in our Site Reliability Engineering team. As the technical leader at the Center for Operational Excellence, you will guide our technology strategies and set standards for engineering practices. With your extensive software development, platform, and systems engineering expertise, you'll lead complex, mission-critical projects, and drive innovation across our tech stack. Your role is pivotal in strategic decision-making, influencing both business and technology outcomes, and mentoring future leaders within our teams.

In this position, you are expected to demonstrate exceptional mastery and advanced skills in software, systems, and platform engineering. You'll ensure that our digital platforms are not only resilient but also incorporate the forefront of cutting-edge technologies and practices. Your influence will be crucial in driving the adoption of modern and emerging practices, shaping the technological future of our organization, and maintaining our leadership in innovative technology solutions.

Roles & Responsibilities:

Provide technical leadership to the Insights and Incident Response teams, promoting an AI-forward approach in operations and fostering innovation, collaboration, and continuous improvement. Oversee the development and enhancement of the Insights team’s self-serve observability platform and the Incident Response team, guiding them towards proactive and predictive analysis to maintain high SLAs and preempt system issues while effectively leveraging AIOps for enhanced operational efficiency. Cultivate an in-depth understanding of the enterprise, cloud, and production infrastructure, associated products, applications, and services, and analyze and troubleshoot these complex distributed systems, advocating best practices based on observed design and incident patterns and providing domain expertise for early design guidance and decision-making. Collaboratively lead the development of secure, robust, and high-performing infrastructure architectures alongside the Principal DevOps Engineer and work closely with InfoSec and Enterprise Architecture teams to set and enforce standards across crucial infrastructure components like Kubernetes cluster designs, serverless architectures, data planes, and local VPC networks. Establish and enforce reliability standards and gated thresholds, including Non-Functional Requirements (NFRs), in partnership with Enterprise Architecture to ensure all products, services, and applications, whether vendor-sourced, open-source, or internally developed, adhere to these standards to guarantee observability, resilience, and reliability. Lead the post-mortem process, ensuring thorough analysis and actionable outcomes from each incident are addressed by the identified owners and act as the primary contact during incidents, guiding the response team to practical solutions that drive improvements to foster a culture of proactive learning and system resilience. Design and implement internal tools and software solutions to address gaps in observability and reliability, bridging existing capabilities with emerging needs and ensuring these solutions enhance system monitoring, incident response, and overall infrastructure resilience.

We'd love to hear from you if you have:

Software Engineering: Demonstrated exceptional expertise in programming and scripting with a mastery of languages like Java, C#, Python, Go, JavaScript, TypeScript, Bash, and PowerShell and leveraging these skills for advanced automation, process optimization, and innovative solution development. Expert use of monitoring and incident response tools (Dynatrace, Datadog, Grafana, New Relic, PagerDuty, OpsGenie, Splunk OnCall), applying strategic approaches to incident response, system troubleshooting, and performance optimization. Advanced skills in analyzing, evaluating, and integrating vendor solutions and open-source projects. Proficiency in creating custom, high-performance tools, and solutions to bridge gaps in technology, striking a balance between innovative in-house development and external technological advances. Deep expertise in Agile methodologies, DevOps practices, and mastery in Continuous Integration/Continuous Deployment (CI/CD) pipelines, coupled with sophisticated release management practices to ensure efficient, reliable software deployment. Systems Engineering: Mastery in Infrastructure as Code, with extensive experience using tools like Terraform, CloudFormation, Azure Resource Manager, Google Deployment Manager, Ansible, and Pulumi for effective, scalable, and secure infrastructure management. Comprehensive experience in managing systems across multiple and hybrid cloud environments, showcasing proficiency in optimizing for operational efficiency, latency, security, and compliance. In-depth expertise in various data storage solutions (SQL, NoSQL), advanced skills in queueing systems (Kafka, RabbitMQ), and transient data solutions (Redis, Memcache), ensuring high data integrity and optimal performance. Extensive knowledge and expertise in network architecture, including mastery of VPCs, DNS, CDN, load balancing, and network security practices, essential for designing robust, scalable systems. Platform Engineering: Demonstrated mastery in planning, executing, and optimizing complex system architectures, with deep expertise in microservices and serverless frameworks. Proven ability in handling scalability and efficiency challenges. Profound expertise across major cloud platforms (AWS, Azure, Google Cloud Platform), designing, deploying, and optimizing sophisticated cloud-native solutions. Expertise in high availability, disaster recovery, and scalability in cloud environments. Extensive hands-on experience with container technologies, particularly Kubernetes, demonstrating advanced deployment capabilities, managing, and scaling applications. Proven ability to construct and maintain hyper-scalable, fault-tolerant infrastructures. Leadership in technological innovation and cross-platform integrations, with the ability to foresee emerging technology trends and apply them in creating forward-thinking solutions driving the adoption of cutting-edge technologies and methodologies to maintain and enhance the company's competitive edge in platform engineering.

Preferred Qualifications

Experience in the lending technology sector or related financial services industries, demonstrating an understanding of industry-specific challenges and the ability to tailor technological solutions to meet these unique requirements. Hold advanced certifications in critical areas like cloud technologies, AI, data management, networking, and cybersecurity, reflecting deep and broad technical expertise that enhances our technology strategies. Proven experience in leading technical projects with cross-functional and multinational teams, showcasing strong leadership skills and the ability to drive successful outcomes in complex, collaborative environments. A track record of innovative problem-solving, with examples of implementing cutting-edge solutions or pioneering new technological approaches. Strong capabilities in communicating complex technical concepts to non-technical stakeholders, fostering effective collaboration across different departments. A demonstrated commitment to continual professional development, staying abreast of the latest industry trends and technologies, and adaptability in applying this knowledge to evolving challenges. Active involvement in open-source projects, technical forums, or professional communities showcasing dedication to the broader tech community and a commitment to collaborative growth and learning.

 We'd love to hear from you if you have:  

Multi Cloud (Azure & GCP) Setting up Security And code Quality Scans  Worked closely with DevOps, SRE and Analytics teams 

About Encora

Encora is a leading provider of software and digital engineering solutions, boasting 48 global offices and a team of over 9,000 Encorians. We're present in tech-rich regions, equipped to deliver exceptional services in product engineering, the cloud, data modernization, digital experience, and more.

Our hiring philosophy is rooted in skill and talent, fostering a workplace where diversity and inclusivity translate into innovative solutions for every client.



  • Chennai, India Encora Inc. Full time

    Important InformationExperience: 6 to 8 yearsJob Location: ChennaiPosition Type: Full time.Work Mode- Hybrid (3 days in office)Principal Site Reliability EngineerAbout the Opportunity:The Principal Site Reliability Engineer is vital in our Site Reliability Engineering team. As the technical leader at the Center for Operational Excellence, you will guide our...


  • chennai, India Encora Inc. Full time

    Important Information Experience: 6 to 8 years Job Location: Chennai Position Type: Full time. Work Mode- Hybrid (3 days in office)  Principal Site Reliability Engineer About the Opportunity:   The Principal Site Reliability Engineer is vital in our Site Reliability Engineering team. As the technical leader at the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description : - For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • chennai, India Corpxcel Consulting Full time

    Job Description : - For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Strong expertise in Azure cloud services and solutions.Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools.Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible.Familiarity with CI/CD pipelines and...


  • chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Strong expertise in Azure cloud services and solutions. Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools. Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible. Familiarity with CI/CD pipelines...


  • Chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Strong expertise in Azure cloud services and solutions.Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools.Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible.Familiarity with CI/CD pipelines and...


  • chennai, India TERRAGIG LLP Full time

    Role : Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.-...


  • Chennai, India ZF Group Full time

    Req ID SDC Chennai, IndiaYour Tasks7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform, CI/CD...


  • chennai, India ZF Group Full time

    Req ID SDC Chennai, India Your Tasks 7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform,...


  • Chennai, India ZF Group Full time

    Req ID SDC Chennai, India Your Tasks 7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applications Experience with infrastructure & service monitoring and alerting Experience with application observability Experience with Kafka, Terraform,...


  • Chennai, Tamil Nadu, India Ford Business Solutions Full time

    Short Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal Candidates Strong background in software development and systems...


  • Chennai, India Ford Business Solutions Full time

    Short Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal Candidates Strong background in software development and systems...


  • Chennai, India Ford Business Solutions Full time

    Short Description: A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal CandidatesStrong background in software development and systems...


  • Chennai, India Ford Business Solutions Full time

    Short Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal Candidates Strong background in software development and systems...


  • Chennai, India Corpxcel Consulting Full time

    For SRE :- Have experience in automation- Operational Knowledge in any of the CICD Tooling Technologies- Understanding of the cloud deployments and SRE- 5-8 years of solid, diverse work experience in a Java development and DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- At least 3 years of experience with Java...


  • Chennai, India Corpxcel Consulting Full time

    For SRE : - Have experience in automation - Operational Knowledge in any of the CICD Tooling Technologies - Understanding of the cloud deployments and SRE - 5-8 years of solid, diverse work experience in a Java development and DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- At least 3 years of experience with...


  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    For SRE :- Have experience in automation- Operational Knowledge in any of the CICD Tooling Technologies- Understanding of the cloud deployments and SRE- 5-8 years of solid, diverse work experience in a Java development and DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- At least 3 years of experience with Java...