Site Reliability Developer 3

1 week ago


india Oracle Full time

The NRE (Network Reliability Engineering) team is accountable for ensuring the robustness of the Oracle Cloud Network Infrastructure. A Network Reliability Engineer (NRE) role is primarily focused on applying an engineering approach to measure and automate a network's reliability to align with Organization's service-level objectives, agreements, and goals. The duties of the NRE team entail promptly responding to network disruptions, pinpointing the underlying cause, and collaborating with internal and external stakeholders to fully restore functionality. The NRE team members play a critical role in automation of recurring tasks in daily operations to streamline processes, enhance workflow efficiency, and increase overall productivity. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLoS Network, and the Internet. Some of the responsibilities include designing, writing, and deploying network monitoring and automation software, to improve the availability, scalability, and efficiency of Oracle products and services. Requirements: Bachelor’s degree in CS or related engineering field with 5+ years of Network Engineering experience or Master's with 5+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience working in a network operations role. Strong knowledge of protocols such as MPLS, BGP, IPv6, DNS, and DHCP, SSL. Also, VxLAN and EVPN will be an added advantage. Deeper understanding of Data Center build and design - CLoS architecture etc. Extensive experience with scripting or automation and data center design – Python preferred but must demonstrate expertise in scripting or compiled language. Experience with network monitoring and telemetry solutions. Hands on experience with Prometheus or other network monitoring software stack. Experience with network modeling and programming – YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve complex issues in creative and effective ways. Capable of working under limited supervision. Excellent organizational, verbal, and written communication skills. Excellent judgment in influencing product roadmap direction, features, and priorities. Bachelor’s or master’s degree in computer science, Electrical/Hardware Engineering, or related field. Participate in an on-call rotation. Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.  Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint. This support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet. Collaborate with program/project managers to develop milestones and deliverables. Will primarily use existing procedures and tools to develop and safely execute network change. However, will also contribute to developing new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions.  Mentor junior engineers.  Participates in network solution and architecture design process.  Participate in operational rotations as either primary or secondary on-call. Provide break-fix support for events. Serve as the escalation point for event remediation. Lead post-event root cause analysis. Frequently develops scripts to automate routine tasks for team and business units. Coordinate with networking automation services for the development and integration of support tooling.  Coordinate with network monitoring to gather telemetry and create alerts rules using them.  Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies.  Serves as SME on software development projects for network automation and network monitoring. Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems.  Career Level - IC3



  • India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    DescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....


  • India Jobgether Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer 3 in India.As a Site Reliability Engineer 3, you will play a critical role in maintaining the reliability, scalability, and performance of cloud-based systems. You will lead initiatives to automate processes, monitor infrastructure,...


  • India Oracle Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    DescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....


  • India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    DescriptionYou will be responsible to work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission...


  • Bengaluru, India Relanto Full time

    Job Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...


  • India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license, focus more on patients and less on the computer, and achieve peak efficiency –supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new...


  • India InOrg Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    About VivaOps :VivaOps is a leading DevSecOps platform company specializing in GitLab - The comprehensive DevOps platform, to transform and secure software development processes. We help organizations to streamline their DevSecOps journey by offering a complete range of GitLab services, from advisory, to implementation and managed services, to accelerate...


  • india Tata Consultancy Services Full time

    Role: Site Reliability EngineerLocation: Chennai/Bangalore/HyderabadExp- 5-11 years1.Exposure to any APM tool like Dynatrace, Appdynamics, Splunk, etc2.DBA or Infra admin 3.Gremlin or Chaos Monkey or Simian Army or Litmus expertise4.Exposure to ITSM tools like Service Now, etc5.Understanding of Automation and Chaos Engineering6.Exposure to Devops tools and...


  • india Tata Consultancy Services Full time

    Role: Site Reliability EngineerLocation: Chennai/Bangalore/HyderabadExp- 5-11 years1.Exposure to any APM tool like Dynatrace, Appdynamics, Splunk, etc2.DBA or Infra admin 3.Gremlin or Chaos Monkey or Simian Army or Litmus expertise4.Exposure to ITSM tools like Service Now, etc5.Understanding of Automation and Chaos Engineering6.Exposure to Devops tools and...


  • India HRhelpdesk Full time

    About the company : Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions. Job Summary : As a Site Reliability Engineer (SRE), you will be responsible for building and maintaining the infrastructure, tools, and pipelines that keep our systems running smoothly. You will collaborate closely with DevOps,...