Senior Site Reliability Engineer
1 week ago
Role : Senior Site Reliability Engineer
Team: OCI Reliability
Shift : 6am - 2pm
Skills required : Production Incidence, Automation, Python.
Location : Remote
Job description
As a Senior Site Reliability Engineer, you will focus on detecting, triaging, and mitigating OCI service-impacting events quickly and efficiently. You will be responsible for minimising downtime by delivering exceptional major incident management and ensuring the reliability, scalability, performance, and security of the systems that prevent incidents from occurring. Your work will directly contribute to reducing event duration by leveraging your operational expertise, best practices, and the ability to develop tools that automate and improve incident management processes.
Oracle Cloud is cutting-edge and continuously evolving. When issues arise, your team will respond within minutes to mitigate customer impact and ensure service continuity. This role will give you deep insight into the inner workings of OCI’s systems and operations. You’ll collaborate with and influence leaders across Oracle, driving organisational initiatives aimed at continually improving OCI-wide service availability. As part of an agile, high-impact team, you will play a crucial role in shaping the future of Oracle Cloud. If you're excited to be part of a fast-moving team that’s pushing the boundaries of innovation, we’d love to connect with you
We are looking for candidates who are flexible to work APAC shift hours (6 AM to 2 PM IST).
Career Level - IC3
Responsibilities :
- Lead major incident recovery by orchestrating cross-functional collaboration, driving rapid escalation, clear communication, and seamless stakeholder alignment to ensure swift and effective resolution.
- Identify opportunities to automate and streamline critical incident workflows, taking full ownership of developing and implementing innovative solutions to enhance efficiency and drive faster resolutions.
- Leverage deep expertise in cloud computing design patterns and dependencies to proactively mitigate complex major incidents and optimize cloud-based solutions and Leverage your expertise to quickly diagnose root causes, mitigate impact, and implement long-term fixes.
- Proficient in troubleshooting cloud infrastructure issues using observability platforms to monitor, analyse, and resolve performance and reliability challenges.
- Continuously improve operational processes, tools, and workflows to enhance the reliability and efficiency of the cloud infrastructure.
Minimum Qualifications
- Bachelor's degree or higher in Computer Science or a related field, or equivalent work experience.
- 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or Systems Engineering.
- Extensive hands-on experience with public cloud operations (e.g., AWS, Azure, GCP, OCI).
- Proven track record in Major Incident Management within cloud-based environments, with the ability to drive effective incident resolution.
- Strong understanding of automation and orchestration principles, with a focus on improving system reliability and efficiency.
- Proficiency in at least one modern object-oriented programming language (e.g., Python, Java, Go, etc.).
- Solid experience in software engineering best practices, including Agile methodologies, coding standards, code reviews, version control, build processes, testing, and operations.
- Familiarity with infrastructure automation tools such as Chef, Ansible, Jenkins, and Terraform.
- Expertise in several key technologies, including Infrastructure-as-a-Service (IaaS), CI/CD systems, Docker, RESTful APIs, log analysis, and debugging tools.
- Experience with observability platforms such as Grafana, Prometheus, and other monitoring, logging, and tracing tools to optimize system visibility, performance, and issue resolution.
-
Senior Site Reliability Engineer
4 weeks ago
india HCLTech Full timeUrgent Opening for Cloud Senior Site Reliability Engineer role for Pan India location with HCL TechInterested candidates kindly share your updated resume to sagardo@hcltech.com with the subject line "Cloud Senior Site Reliability Engineer Role_ your name & preferred location"Job Description: Ability to learn SRE practices across Red Hat Open Shift, Google...
-
Senior site reliability engineer
4 weeks ago
India Vertex Agility Full timeSenior Site Reliability Engineer - Remote Vertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, Dev Ops, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries...
-
india HCLTech Full timeUrgent Opening for Cloud Senior Site Reliability Engineer role for Pan India location with HCL Tech Interested candidates kindly share your updated resume to with the subject line "Cloud Senior Site Reliability Engineer Role_ your name & preferred location" Job Description: Ability to learn SRE practices across Red Hat Open Shift, Google Cloud or...
-
Site Reliability Engineer
4 weeks ago
India Tata Consultancy Services Full timeDear Candidate, Greetings from TCS !!! TCS is hiring for SRE, please find the below JD….. Experience range – 5+ years Location- Bangalore, Pune, Hyderabad, Chennai Skills Required - Site Reliability Engineer Role& Responsibilities – Collaborates with cloud platform engineers and teams to design, develop, test, and implement...
-
Site Reliability Engineer
4 weeks ago
india Tata Consultancy Services Full timeDear Candidate,Greetings from TCS !!!TCS is hiring for SRE, please find the below JD…..Experience range – 5+ yearsLocation- Bangalore, Pune, Hyderabad, ChennaiSkills Required - Site Reliability EngineerRole& Responsibilities –Collaborates with cloud platform engineers and teams to design, develop, test, and implement availability, reliability,...
-
Senior Site Reliability Engineer
3 weeks ago
india Vertex Agility Full timeSenior Site Reliability Engineer - Remote Vertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, DevOps, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries...
-
Senior Site Reliability Engineer
1 month ago
India Vertex Agility Full timeSenior Site Reliability Engineer - Remote Vertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, DevOps, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries...
-
Senior Site Reliability Engineer
1 month ago
India Vertex Agility Full timeSenior Site Reliability Engineer - RemoteVertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, DevOps, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries across...
-
Senior Site Reliability Engineer
1 month ago
india Vertex Agility Full timeSenior Site Reliability Engineer - Remote Vertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, DevOps, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries...
-
Senior Site Reliability Engineer
1 month ago
india Vertex Agility Full timeSenior Site Reliability Engineer - RemoteVertex Agility is a dynamic, cross-geographic remote consultancy specializing in software engineering, DevOps, and cloud, partnered with some of the most well-known brands globally. We specialise in transforming businesses by providing tailored cloud solutions to our client's needs. Operating from 15+ countries across...
-
Site Reliability Engineer
6 days ago
India IDEMIA Full timeWe are hiring for Site Reliability Engineer role at Noida location. Responsibility: Involved in deploy/manage/operate of medium to large scale production systems Understanding of Linux as a runtime environment Familiar to Cloud native concepts and virtualisation Familiar to CI/CD concepts and tools like Jenkins, Gitlab etc Previous...
-
Site Reliability Engineer
7 days ago
India IDEMIA Full timeWe are hiring for Site Reliability Engineer role at Noida location. Responsibility: Involved in deploy/manage/operate of medium to large scale production systems Understanding of Linux as a runtime environment Familiar to Cloud native concepts and virtualisation Familiar to CI/CD concepts and tools like Jenkins, Gitlab etc Previous...
-
Site reliability engineer
4 weeks ago
India PeopleLogic Full timeJob Responsibilities : Ensure the 24/7 operations and reliability of data services in our production Collaborate with the data engineering development team to design, build, and maintain scalable, reliable, and secure data pipelines and systems. Develop and implement monitoring, alerting, and incident response strategies to proactively identify and...
-
Site Reliability Engineer
3 weeks ago
India InstaService Inc Full timeAbout Us:At InstaService, we are committed to delivering reliable, high-performance home services to our customers. As a fast-growing on-demand services platform, we are looking for a talented DevOps / Site Reliability Engineer (SRE) to join our dynamic team. This role is crucial to scaling and maintaining our infrastructure, ensuring our platform remains...
-
Site Reliability Engineer
2 months ago
India BCE Global Tech Full timeAbout the role We are seeking a talented Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in software engineering and systems administration, with a passion for building scalable and reliable systems. As an SRE, you will collaborate with development and operations teams to ensure our services are reliable,...
-
Site Reliability Engineer
4 weeks ago
india PeopleLogic Full timeJob Responsibilities : Ensure the 24/7 operations and reliability of data services in our productionCollaborate with the data engineering development team to design, build, and maintain scalable, reliable, and secure data pipelines and systems.Develop and implement monitoring, alerting, and incident response strategies to proactively identify and resolve...
-
Site Reliability Engineer
4 weeks ago
india Apex Systems Full timeDevops Engineer Bengaluru & Chennai Remote Looking for an immediate Joiner • Overall 5+yrs of experience as Site Reliability Engineer /Devops Engineer• Bachelor’s or master’s Degree in software engineering, computer science, or in a related technical field• Familiarity with Infrastructure as Code (e.g. Terraform & CloudFormation)• Has a focus in...
-
Site Reliability Engineer
4 weeks ago
India Apex Systems Full timeDevops Engineer Bengaluru & Chennai Remote Looking for an immediate Joiner • Overall 5+yrs of experience as Site Reliability Engineer /Devops Engineer • Bachelor’s or master’s Degree in software engineering, computer science, or in a related technical field • Familiarity with Infrastructure as Code (e.g. Terraform & CloudFormation) • Has...
-
Site Reliability Engineer
1 month ago
Anywhere in India/Multiple Locations Stealth Startup Full timeKey ResponsibilitiesAt Stealth Startup, we're looking for a skilled Site Reliability Engineer to maintain and enhance the reliability, availability, and performance of our large-scale distributed systems. Your key responsibilities will include automating deployment, monitoring, and management of production systems, as well as implementing and managing CI/CD...
-
Site Reliability Engineer
4 weeks ago
India Tanla Platforms Limited Full timeAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Tanla Platforms Limited. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and reliability of our platforms and applications.Key Responsibilities:Design, implement, and maintain scalable and highly available...