Senior Site Reliability Engineer

4 weeks ago

Thiruvananthapuram Trivandrum India Equifax Full time

Job Description Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SRE is also an engineering approach to building and running production systems we engineer solutions to operational problems. Our SREs are responsible for overall system operation and we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages. Our SRE culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Equifax brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn, grow and take pride in our work. What You'll Do - Work in a DevSecOps environment responsible for the building and running of large-scale, massively distributed, fault-tolerant systems. - Work closely with development and operations teams to build highly available, cost effective systems with extremely high uptime metrics. - Work with cloud operations team to resolve trouble tickets, develop and run scripts, and troubleshoot - Create new tools and scripts designed for auto-remediation of incidents and establishing end-to-end monitoring and alerting on all critical aspects - Build infrastructure as code (IAC) patterns that meets security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK). - Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management. What Experience You Need - BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required - 2-5 years of experience in software engineering, systems administration, database administration, and networking. - 1+ years of experience developing and/or administering software in public cloud - Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives. - Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js - Demonstrable cross-functional knowledge with systems, storage, networking, security and databases - System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.) - Proficiency with continuous integration and continuous delivery tooling and practices - Cloud Certification Strongly Preferred What Could Set You Apart An ability to demonstrate successful performance of our Success Profile skills, including: - Experience with GCP/GKE, Composer. - Certifications in Kubernetes (CKA, CKAD) or cloud certification. - You have expertise designing, analyzing and troubleshooting large-scale distributed systems. - You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive - You have experience managing Infrastructure as code via tools such as Terraform or CloudFormation - You are passionate for automation with a desire to eliminate toil whenever possible - You've built software or maintained systems in a highly secure, regulated or compliant industry - You thrive in and have experience and passion for working within a DevOps culture and as part of a team

Remote site reliability engineer(devops)

4 weeks ago

Trivandrum, India Zafin Full time

Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling): ~ Run the error-budget policy with multi-window, multi-burn-rate alerts;...
Site Reliability Engineer

4 weeks ago

, India, IN Sonata Software Full time

We're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
Senior Site Reliability Engineer

6 hours ago

India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

DescriptionSenior Site Reliability Engineer - RemoteDo you have a passion for cutting edge technologies and tackling system problems?Are you a self-starting professional who thrives in a fast-paced environment?Join our critical CPS SRE teamWe ensure that infrastructure services have world-class reliability and uptime. Site Reliability Engineer(SRE)s are the...
Senior/expert site

4 weeks ago

India IVedha Inc. Full time

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice Location: India (Remote) -Must be available to work in the EST (US/Canada) Time Zone. Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...
Site Reliability Engineer

2 days ago

India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Do you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...
Site Reliability Engineer

3 weeks ago

Bengaluru, Karnataka, India, Karnataka WhiteLotus Talent Partners Full time

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
Site Reliability Engineer

5 days ago

India Akamai Technologies Full time

Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...
Senior Site Reliability Engineer

1 week ago

India Akamai Technologies Full time

Job Description Job Description Senior Site Reliability Engineer - Remote Do you have a passion for cutting edge technologies and tackling system problems Are you a self-starting professional who thrives in a fast-paced environment Join our critical CPS SRE team! We ensure that infrastructure services have world-class reliability and uptime. Site Reliability...
Senior Site Reliability Engineer- ELK Expert

2 weeks ago

India iVedha Inc. Full time

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...
Site Reliability Engineer

1 week ago

Thiruvananthapuram, Kerala, India Equifax Full time ₹ 1,04,000 - ₹ 1,30,878 per year

Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.SRE is also an...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer