Senior Site Reliability Engineer

4 days ago


Thiruvananthapuram Trivandrum India Equifax Full time

Job Description Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SRE is also an engineering approach to building and running production systems we engineer solutions to operational problems. Our SREs are responsible for overall system operation and we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages. Our SRE culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Equifax brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn, grow and take pride in our work. What You'll Do - Work in a DevSecOps environment responsible for the building and running of large-scale, massively distributed, fault-tolerant systems. - Work closely with development and operations teams to build highly available, cost effective systems with extremely high uptime metrics. - Work with cloud operations team to resolve trouble tickets, develop and run scripts, and troubleshoot - Create new tools and scripts designed for auto-remediation of incidents and establishing end-to-end monitoring and alerting on all critical aspects - Build infrastructure as code (IAC) patterns that meets security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK). - Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management. What Experience You Need - BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required - 2-5 years of experience in software engineering, systems administration, database administration, and networking. - 1+ years of experience developing and/or administering software in public cloud - Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives. - Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js - Demonstrable cross-functional knowledge with systems, storage, networking, security and databases - System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.) - Proficiency with continuous integration and continuous delivery tooling and practices - Cloud Certification Strongly Preferred What Could Set You Apart An ability to demonstrate successful performance of our Success Profile skills, including: - Experience with GCP/GKE, Composer. - Certifications in Kubernetes (CKA, CKAD) or cloud certification. - You have expertise designing, analyzing and troubleshooting large-scale distributed systems. - You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive - You have experience managing Infrastructure as code via tools such as Terraform or CloudFormation - You are passionate for automation with a desire to eliminate toil whenever possible - You've built software or maintained systems in a highly secure, regulated or compliant industry - You thrive in and have experience and passion for working within a DevOps culture and as part of a team



  • Thiruvananthapuram / Trivandrum, India Reflections Info Systems Full time

    Job Description As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability...


  • Thiruvananthapuram / Trivandrum, India Reflections Info Systems Full time

    Job Description As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling): ~ Run the error-budget policy with multi-window, multi-burn-rate alerts;...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling): ~ Run the error-budget policy with multi-window, multi-burn-rate alerts;...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do - SLIs/SLOs & contracts: Define customer-centric...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II)Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll do- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II)Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll do- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric...

  • Senior/expert site

    2 days ago


    India IVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice Location: India (Remote) -Must be available to work in the EST (US/Canada) Time Zone. Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...