Principal Site Reliability Engineer

1 day ago


Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year
Description

Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX.  Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that create risk for operations across the organization and resolving those issues with a mixture of engineering, development, troubleshooting expertise, and general operational guidance. This role also requires excellent communication and organizational skills. The candidate is expected to collaborate with service owners, other engineers and developers to deliver a superior support experience to development community.

Principal Site Reliability Engineer provides support for development and testing across multiple IBM Z servers running multiple operating systems with associated software.  This position will be responsible for creating and implementing system enhancements to the IBM Z (mainframe) environments that will improve the performance and reliability for the environment. The role requires managing the disparate workloads of multiple IBM Z servers in complex configurations.  Extensive problem solving across all aspects of the IBM Z configurations including analysis of development generated issues and problems.

Responsibilities

IBM Z Systems

  • Administer and operate IBM Z Hardware Management Console (HMC).
  • Configure and maintain IBM Z I/O subsystem definitions (IOCDS).
  • Manage network interfaces connecting IBM Z systems to Oracle networks and operating environments.
  • Administer IBM HMC for DS8000 disk storage systems.

z/OS Platform

  • Install, configure, and maintain z/OS using SMP/E and z/OSMF tools.
  • Design and implement Sysplex environments in both LPAR and z/VM virtualized setups for DB2 workloads.
  • Manage enterprise storage using DFSMS.
  • Enforce and maintain system security through RACF administration.
  • Configure and maintain network communication using VTAM and Comm Server.
  • Administer and support Unix System Services (USS) within z/OS.

z/OS Software Stack

  • Install, upgrade, and manage key middleware components including DB2, CICS TS, and IMS using SMP/E.

z/VM for Linux Environments

  • Install, configure, and maintain z/VM systems.
  • Manage multiple Single System Image (SSI) clusters for high availability.
  • Administer and maintain DIRMAINT within each SSI cluster.
  • Configure and support z/VM networking infrastructure.
  • Monitor and optimize performance using Performance Toolkit and related monitoring tools.

z/VM for z/OS Guests

  • Provide operational support for multiple z/OS guest systems running under z/VM.
  • Maintain and manage base directories using a clustered approach.
  • Configure and manage shared disk environments across z/VM systems.
  • Utilize EREP for error reporting and system reliability tracking.
  • Develop and maintain automation scripts using REXX for directory and system management.
  • Leverage CMS PIPES for data processing and automation tasks.
  • Monitor performance and capacity with Performance Toolkit and related utilities.

Linux on IBM Z

  • Deploy and manage SLES and RHEL Linux hosts running under z/VM.
  • Configure z/VM interfaces for virtual disk and network integration.
  • Collaborate with development teams in defining and implementing improvements in service architecture. 
  • Act as escalation point for critical issues that may not have a documented procedure and provide Root Cause Analysis 
  • Understand the end-to-end configuration, technical dependencies, characteristics of development infrastructure.
  • Design and delivery of mission critical automation, with focus on security, resiliency, scale, and performance. 
  • Author functional and technical documentation and standard operating producers (SOP)

Knowledge Skills

  • 6 - 10 years of experience in IBM Z Systems, z/OS Platform and z/VM
  • Experience in debugging operating system performance issues and performance tuning
  • Expertise in developing scripts, utilities and tools to automate routine or manual intensive tasks.
  • Experience in operations and problem management 
  • Experience working with fault tolerant, highly available, high throughput, distributed and scalable systems.
  • Experience of working with global teams across different time zones.
  • Aptitude to be a good team player and the desire to learn and implement new technologies as needed
  • Excellent organizational, verbal, and written communication skills
Qualifications

Career Level - IC4



  • Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX.  Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that...


  • Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...


  • Hyderabad, Telangana, India Oracle Full time ₹ 15,00,000 - ₹ 30,00,000 per year

    We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions...


  • Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 30,00,000 per year

    We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles , ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions...


  • Hyderabad, Telangana, India Oracle Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    DescriptionWe are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement...


  • Hyderabad, Telangana, India Oracle Full time ₹ 20,00,000 - ₹ 60,00,000 per year

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...


  • Hyderabad, Telangana, India JPMorgan Chase Full time ₹ 45,00,000 - ₹ 90,00,000 per year

    Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact.As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking division, you will leverage your advanced expertise to...


  • Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Oracle is looking for a Principal Site Reliability Developer with world-class experience in developing and supporting large scale cloud deployments across the world. The candidate should have expert level knowledge of Oracle Weblogic Application, Automation, and Running the System Production at Operational Level. The position is part of SaaS Engineering...


  • Hyderabad, Telangana, India Amgen Inc Full time ₹ 8,00,000 - ₹ 12,00,000 per year

    We are looking for a Site Reliability Engineer/Cloud Engineer (SRE) to work on the performance optimization, standardization, and automation of Amgens critical infrastructure and systems. This role is crucial to ensuring the reliability, scalability, and cost-effectiveness of our production systems. The ideal candidate will work on operational excellence...


  • Hyderabad, Telangana, India GHX Full time ₹ 4,00,000 - ₹ 6,00,000 per year

    Site Reliability Engineer (SRE)Position SummaryThe Site Reliability Engineer (SRE) will be a hands-on contributor within the Site Reliability Engineering Center of Excellence (CoE), responsible for building monitoring and observability solutions, troubleshooting production issues, and participating in 24x7 on-call operations.This role focuses on the...