Site Reliability Engineer/Architect
2 weeks ago
Job Summary
We are seeking an experienced Site Reliability Engineer (SRE) Architect with over 10 years of IT experience, specializing in designing and implementing highly scalable, reliable, and automated systems.
The ideal candidate will have strong expertise in cloud-native architectures, automation, monitoring, and SRE practices.
This role requires excellent leadership, technical depth, and the ability to guide large-scale enterprise reliability initiatives.
Key Responsibilities
- Design and implement scalable, reliable, and automated infrastructure solutions.
- Lead SRE initiatives across multiple teams, ensuring adherence to SRE principles (SLIs, SLOs, SLAs).
- Drive incident management, root cause analysis, and postmortem processes.
- Define and implement observability standards (monitoring, logging, alerting).
- Collaborate with development and operations teams to improve system reliability and performance.
- Automate infrastructure provisioning and deployments using IaC (Terraform, Ansible, etc.).
- Build and optimize CI/CD pipelines for zero-downtime deployments.
- Ensure high availability, fault tolerance, and disaster recovery strategies.
- Establish performance benchmarks, load testing, and capacity planning.
- Provide leadership and mentorship to SRE and DevOps teams.
Required Skills & Qualifications
- 10+ years of IT experience with at least 5 years in SRE/DevOps roles.
- Expertise in cloud platforms: AWS, Azure, or GCP.
- Strong knowledge of Kubernetes, Docker, and microservices architecture.
- Hands-on experience with Infrastructure as Code (Terraform, Ansible, CloudFormation).
- Proficiency in programming/scripting languages such as Python, Go, or Bash.
- Experience with monitoring tools (Prometheus, Grafana, ELK, Datadog, Dynatrace).
- Strong background in CI/CD pipeline design and automation (Jenkins, GitHub Actions, GitLab CI).
- In-depth knowledge of networking, load balancers, DNS, and security best practices.
- Excellent problem-solving and incident management skills.
- Strong leadership and stakeholder management abilities.
Preferred Qualifications
- Certified Kubernetes Administrator (CKA) or AWS/Azure/GCP Cloud Architect certification.
- Experience in large-scale distributed systems design.
- Background in performance engineering and chaos engineering.
- Knowledge of ITIL practices for incident, problem, and change management.
)
-
Principal Site Reliability Engineer
3 days ago
Greater Kolkata Area, India Atlassian Full time ₹ 1,20,000 - ₹ 2,60,000 per yearOverviewWe are looking for a reliability expert who is passionate about scaling Cloud services to join our growing Site Reliability Engineering (SRE) teams. You are someone who is aware of current industry trends (particularly those related to reliability) and who values working with a diverse set of partners, who can articulate the business impact of a...
-
Senior Site Reliability Engineer
2 weeks ago
Greater Bengaluru Area, India Ivanti Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSenior Site Reliability Engineer - 6 positionsWhy We Need YouSenior Site Reliability Engineering (SRE) - is a growing team that partners closely with Product Engineering, Security, and Support. We are responsible for the reliability, deployment, and continuous operation of the Ivanti Cloud services. We need your help to take our existing platform to the next...
-
Site Reliability Engineer
1 week ago
Greater Noida, Uttar Pradesh, India TRH Consultancy Services Full time ₹ 4,00,000 - ₹ 12,00,000 per yearDescription : We are seeking a Site Reliability Engineer with expertise in OpenTelemetry to join our team in India. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while implementing best practices for observability and monitoring.Responsibilities : - Design, implement, and maintain...
-
Site Reliability Engineer III
2 weeks ago
Greater Delhi Area, India RELX Full time ₹ 1,04,000 - ₹ 1,30,878 per yearJoin our team in delivering high-quality software to customers worldwideAre you motivated to collaborate, solve problems, and inspire others with your enthusiasm?About The BusinessLexisNexis Risk Solutions is an essential partner in risk assessment. In our Business Services vertical, we offer solutions to help organizations of all sizes drive growth, improve...
-
Specialist - Site Reliability Engineer
2 weeks ago
Pune/Pimpri-Chinchwad Area, India Accelya Full time ₹ 15,00,000 - ₹ 25,00,000 per yearFor more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
-
Site Reliability Engineer
2 weeks ago
Chennai, Kolkata, Mumbai, India Learningmate Solutions Full time ₹ 15,00,000 - ₹ 25,00,000 per yearAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Site Reliability Engineer, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business...
-
Site Reliability Engineer
2 weeks ago
Greater Noida, India TRH Consultancy Services Full timeDescription : We are seeking a Site Reliability Engineer with expertise in OpenTelemetry to join our team in India. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while implementing best practices for observability and monitoring.Responsibilities : - Design, implement, and maintain reliable...
-
Site Reliability Engineer Level 2
2 weeks ago
Greater Noida, Uttar Pradesh, India CorroHealth Full time ₹ 9,00,000 - ₹ 12,00,000 per yearWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...
-
Site Reliability Engineering Manager
6 days ago
Kolkata, India CloudHire Full timeJob Description Job Summary The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical...
-
Site Reliability Engineering Manager
4 days ago
Kolkata, India CloudHire Full timeDescription : - Provide leadership and management to a remote team of Site Reliability Engineers, ensuring alignment with organizational priorities and goals.- Oversee team operations, including incident management, technical support, and infrastructure maintenance.- Act as the primary point of escalation for complex technical issues, collaborating with the...