Site Reliability Developer 3
7 days ago
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuni Within the Oracle Health (OHAI) organization, the new EHR and Clinical AI Agent cloud services are at the forefront of new generative AI services for healthcare organizations. Building on the success of the established Digital Assistant (ODA) product, EHR and AI Agent enable healthcare providers to leverage advanced AI technologies, together with voice commands, to reduce manual work and enable providers to focus on patient care.
Oracle Health EHR is expanding their OCI Operations team, and looking to bring in new Site Reliability Engineers. As an SRE engineer, you will be engaged in solving technical challenges on an advanced OCI cloud service platform, focusing on areas such as reliability, scalability, resilience, security, and performance.
You will define how to use latest technologies to optimize the operational efficiency of the service. You will gain a deep understanding of ChatBots, cognitive services, machine learning and analytics. You will work with a team pushing the boundaries of a scalable, self-healing, autonomous platform built on Kubernetes, Docker, Prometheus, and Grafana. You will be exposed to a wide range of OCI cloud services and understand how we interact with many dependent services across the organization.
Areas of responsibility
- Service Ownership
As part of the EHR/Clinial Agent team, you will be responsible for all operational aspects of the OCI services included in our portfolio.
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the Digital Assistant suite of products.
Own end-to-end availability, reliability, and performance of a Cloud Service
Participate in LiveSite operations, working rapidly to mitigate issues that may arise.
- Service Design
Designing and implement solutions for rolling out software and security updates with zero downtime
Partner with development and product management to build and maintain platform and automation frameworks to ensure maximum up-time and predictability, preventing outages and service interruptions or degradation
Analyze system failures and developing rapid response processes
- Operations engineering
Evaluate the operation of cloud service deployments across commercial and government datacenters
Monitor the degradation of the service and dependencies under load, and implement solutions to ensure high availability to our customers
Analyse resource utilization and scaling requirements in a high-end production system
Resolve security vulnerabilities to conform to corporate and government security standards.
- Automation
Building on your understanding of automation and orchestration principles, you will be identifying opportunities to automate SRE procedures in production environments
The solution implemented will be designed to minimize the possibility of errors being introduced into the system
- Technical expertise
Handle complex, critical issues encountered in production environments, drawing on your accumulated technical knowledge to rapidly identify the issues and apply steps to mitigate.
Develop an understanding of the underlying AI technologies used to implement the Clinical Digital Assistant service
As an SME, you will be called in to handle major incidents, and your understanding of the architecture and dependent services will position you to apply mitigations to resolve the issue quickly, then working with development to assist implementing preventative actions.
Career Level - IC3
Requirements
5+ years of professional experience as a Site Reliability Engineer or equivalent experience.
BS or MS in Information Technology/Computer System Engineering, or equivalent
Excellent team skills, can-do attitude, focus on quality.
Strong trouble shooting capabilities targeting complicated problems in remote systems
Experience with production operations and best practices for deploying quality code in production.
Experience with public cloud (OCI, AWS, GCP, Azure).
Experience and working knowledge in Python, Perl and/or Shell Scripting.
Knowledge of Infrastructure as Code (IaaC) like Shepherd and Terraform.
Experience with public cloud managed Kubernetes.
Experience with cloud-native administration and monitoring/alerting technologies such as Docker, Helm, Prometheus, Grafana, EFK/ELK, Jaeger, or similar technologies.
Knowledge of version control using Git.
Experience in Linux/Unix environment ng.
Responsibilities
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
QualificationsCareer Level - IC3
-
Site Reliability Developer 3
1 week ago
india Oracle Full timeThe NRE (Network Reliability Engineering) team is accountable for ensuring the robustness of the Oracle Cloud Network Infrastructure. A Network Reliability Engineer (NRE) role is primarily focused on applying an engineering approach to measure and automate a network's reliability to align with Organization's service-level objectives, agreements, and goals....
-
Site Reliability Engineer 3
1 week ago
India Jobgether Full time ₹ 12,00,000 - ₹ 24,00,000 per yearThis position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer 3 in India.As a Site Reliability Engineer 3, you will play a critical role in maintaining the reliability, scalability, and performance of cloud-based systems. You will lead initiatives to automate processes, monitor infrastructure,...
-
Site Reliability Developer 3
2 weeks ago
India Oracle Full time ₹ 12,00,000 - ₹ 24,00,000 per yearDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....
-
Site Reliability Developer 3
3 days ago
India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license, focus more on patients and less on the computer, and achieve peak efficiency –supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new...
-
Site Reliability Developer 4
3 days ago
India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescriptionYou will be responsible to work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Site Reliability Engineer
2 weeks ago
India InOrg Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout VivaOps :VivaOps is a leading DevSecOps platform company specializing in GitLab - The comprehensive DevOps platform, to transform and secure software development processes. We help organizations to streamline their DevSecOps journey by offering a complete range of GitLab services, from advisory, to implementation and managed services, to accelerate...
-
Site Reliability Engineer
3 days ago
india Tata Consultancy Services Full timeRole: Site Reliability EngineerLocation: Chennai/Bangalore/HyderabadExp- 5-11 years1.Exposure to any APM tool like Dynatrace, Appdynamics, Splunk, etc2.DBA or Infra admin 3.Gremlin or Chaos Monkey or Simian Army or Litmus expertise4.Exposure to ITSM tools like Service Now, etc5.Understanding of Automation and Chaos Engineering6.Exposure to Devops tools and...
-
Site Reliability Engineer
3 days ago
india Tata Consultancy Services Full timeRole: Site Reliability EngineerLocation: Chennai/Bangalore/HyderabadExp- 5-11 years1.Exposure to any APM tool like Dynatrace, Appdynamics, Splunk, etc2.DBA or Infra admin 3.Gremlin or Chaos Monkey or Simian Army or Litmus expertise4.Exposure to ITSM tools like Service Now, etc5.Understanding of Automation and Chaos Engineering6.Exposure to Devops tools and...
-
Site Reliability Engineer
7 days ago
India HRhelpdesk Full timeAbout the company : Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions. Job Summary : As a Site Reliability Engineer (SRE), you will be responsible for building and maintaining the infrastructure, tools, and pipelines that keep our systems running smoothly. You will collaborate closely with DevOps,...