Site Reliability Developer 3

4 weeks ago


Noida India Oracle Full time

Job Description Job Description Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuni Within the Oracle Health (OHAI) organization, the new EHR and Clinical AI Agent cloud services are at the forefront of new generative AI services for healthcare organizations. Building on the success of the established Digital Assistant (ODA) product, EHR and AI Agent enable healthcare providers to leverage advanced AI technologies, together with voice commands, to reduce manual work and enable providers to focus on patient care. Oracle Health EHR is expanding their OCI Operations team, and looking to bring in new Site Reliability Engineers. As an SRE engineer, you will be engaged in solving technical challenges on an advanced OCI cloud service platform, focusing on areas such as reliability, scalability, resilience, security, and performance. You will define how to use latest technologies to optimize the operational efficiency of the service. You will gain a deep understanding of ChatBots, cognitive services, machine learning and analytics. You will work with a team pushing the boundaries of a scalable, self-healing, autonomous platform built on Kubernetes, Docker, Prometheus, and Grafana. You will be exposed to a wide range of OCI cloud services and understand how we interact with many dependent services across the organization. Areas of responsibility - Service Ownership As part of the EHR/Clinial Agent team, you will be responsible for all operational aspects of the OCI services included in our portfolio. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the Digital Assistant suite of products. Own end-to-end availability, reliability, and performance of a Cloud Service Participate in LiveSite operations, working rapidly to mitigate issues that may arise. - Service Design Designing and implement solutions for rolling out software and security updates with zero downtime Partner with development and product management to build and maintain platform and automation frameworks to ensure maximum up-time and predictability, preventing outages and service interruptions or degradation Analyze system failures and developing rapid response processes - Operations engineering Evaluate the operation of cloud service deployments across commercial and government datacenters Monitor the degradation of the service and dependencies under load, and implement solutions to ensure high availability to our customers Analyse resource utilization and scaling requirements in a high-end production system Resolve security vulnerabilities to conform to corporate and government security standards. - Automation Building on your understanding of automation and orchestration principles, you will be identifying opportunities to automate SRE procedures in production environments The solution implemented will be designed to minimize the possibility of errors being introduced into the system - Technical expertise Handle complex, critical issues encountered in production environments, drawing on your accumulated technical knowledge to rapidly identify the issues and apply steps to mitigate. Develop an understanding of the underlying AI technologies used to implement the Clinical Digital Assistant service As an SME, you will be called in to handle major incidents, and your understanding of the architecture and dependent services will position you to apply mitigations to resolve the issue quickly, then working with development to assist implementing preventative actions. Career Level - IC3 Requirements 5+ years of professional experience as a Site Reliability Engineer or equivalent experience. BS or MS in Information Technology/Computer System Engineering, or equivalent Excellent team skills, can-do attitude, focus on quality. Strong trouble shooting capabilities targeting complicated problems in remote systems Experience with production operations and best practices for deploying quality code in production. Experience with public cloud (OCI, AWS, GCP, Azure). Experience and working knowledge in Python, Perl and/or Shell Scripting. Knowledge of Infrastructure as Code (IaaC) like Shepherd and Terraform. Experience with public cloud managed Kubernetes. Experience with cloud-native administration and monitoring/alerting technologies such as Docker, Helm, Prometheus, Grafana, EFK/ELK, Jaeger, or similar technologies. Knowledge of version control using Git. Experience in Linux/Unix environment ng. Responsibilities Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies. Qualifications Career Level - IC3 About Us As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [Confidential Information] or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.



  • India Oracle Full time

    Job Description Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....


  • Bengaluru, India Relanto Full time

    Job Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...


  • Noida, India BOLD Full time

    Job Description BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI/CD pipelines and automate the build and deployment process. He/she will also work collaboratively with different teams including Dev, QA, and infrastructure. Job Description...


  • Noida, India S&P Global Full time

    Job Description About The Role Grade Level (for internal use): 10 Department overview S&P Global provides innovative products and services that enhance transparency, reduce risk, and improve operational efficiency. Our customers include banks, hedge funds, asset managers, central banks, regulators, auditors, fund administrators and insurance companies. We...


  • Noida, Uttar Pradesh, India Biz2X Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    About the Role:We are seeking an experienced and passionate Senior Site Reliability Engineer (SRE) to join our team. In this role, you will work on improving the availability, reliability, and scalability of our services, systems, and infrastructure. You will collaborate closely with development, operations, and security teams to ensure that our systems are...


  • Noida, Uttar Pradesh, India CorroHealth Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...


  • India Akamai Technologies Full time

    Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...


  • Greater Noida, Uttar Pradesh, India TRH Consultancy Services Full time ₹ 4,00,000 - ₹ 12,00,000 per year

    Description : We are seeking a Site Reliability Engineer with expertise in OpenTelemetry to join our team in India. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while implementing best practices for observability and monitoring.Responsibilities : - Design, implement, and maintain...


  • Noida, Uttar Pradesh, India Cloud Angles Digital Transformation Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    About the Role:We are seeking a skilled and proactive Site Reliability Engineer I & II (SRE II) to join our growing infrastructure team. As an SRE II, you will play a critical role in ensuring the reliability, scalability, and performance of our systems. Youll work independently and collaboratively to design, implement, and maintain robust infrastructure...


  • Noida, Uttar Pradesh, India Cloud Angles Digital Transformation Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Job SummarySite Reliability Engineers (SRE's) cover the intersection of Software Engineer and Systems Administrator. In other words, they can both create code and manage the infrastructure on which the code runs. This is a very wide skillset, but the end goal of an SRE is always the same: to ensure that all SLAs are met, but not exceeded, so as to balance...