Principal Site Reliability Engineer

11 hours ago


Hyderabad, Telangana, India IntraEdge Full time

Job Title: Principal Site Reliability Engineer (Principal SRE)

Experience:
7+ Years

Location:
Hyderabad

Employment Type:
Full-Time

About the Role

We are seeking an experienced
Principal Site Reliability Engineer (SRE)
to provide technical leadership and strategic direction for reliability, scalability, and operational excellence across our technology platforms. This role combines
deep technical expertise, people leadership, and operational strategy
, and serves as a key bridge between SRE teams and broader engineering, product, and business units.

As a Principal SRE, you will champion reliability engineering best practices, lead high-impact initiatives, mentor senior engineers, and drive long-term improvements in system availability, performance, and resilience.

Key Responsibilities

Technical Leadership & Reliability Engineering

  • Provide hands-on technical leadership across
    reliability, availability, scalability, and performance engineering
    initiatives.
  • Define and evolve
    SRE best practices
    , standards, and operational playbooks.
  • Lead initiatives to improve
    system reliability, uptime, latency, and efficiency
    across platforms.
  • Guide architectural decisions to ensure systems are resilient, observable, and fault-tolerant.

Operational Excellence

  • Champion
    operational excellence
    by driving improvements in monitoring, alerting, incident response, and capacity planning.
  • Establish and track
    SLIs, SLOs, and error budgets
    to balance reliability with feature delivery.
  • Lead
    incident management, root cause analysis (RCA)
    , and post-incident reviews to prevent recurrence.
  • Drive automation initiatives to reduce toil and improve operational efficiency.

Leadership & People Development

  • Provide mentorship, coaching, and career guidance to
    SRE Engineers and Senior SRE Engineers
    .
  • Foster a culture of accountability, learning, and engineering excellence.
  • Partner with engineering managers to support team growth, performance, and succession planning.

Cross-Functional Collaboration

  • Act as a
    diplomatic liaison
    between the SRE organization and application engineering, platform, security, and product teams.
  • Align reliability goals with broader organizational priorities and business outcomes.
  • Influence stakeholders through strong communication, data-driven insights, and technical credibility.

Risk Management & Crisis Response

  • Lead
    risk assessment
    and proactive identification of reliability and operational risks.
  • Own crisis management during high-severity incidents, ensuring calm, structured, and effective response.
  • Drive preventative strategies through chaos engineering, resilience testing, and failure simulations.

Strategy & Long-Term Planning

  • Apply
    strategic thinking
    to define long-term reliability roadmaps and operational improvements.
  • Partner with leadership to align SRE investments with long-term platform and business goals.
  • Continuously evaluate tools, technologies, and processes to support scalable growth.

Required Skills & Qualifications

Experience

  • 7+ years
    of professional experience in
    Site Reliability Engineering, DevOps, Platform Engineering
    , or related roles.
  • Proven experience leading large-scale, distributed systems in production environments.

Technical Expertise

  • Exceptional technical proficiency within
    modern cloud-native and enterprise technology stacks
    .
  • Strong knowledge of system design, observability, incident management, and automation.
  • Experience with monitoring, logging, alerting, and reliability tooling.
  • Strong understanding of CI/CD pipelines, infrastructure automation, and operational workflows.

Leadership & Soft Skills

  • Strong
    leadership and people management
    skills.
  • Excellent communication, collaboration, and stakeholder management abilities.
  • Proven ability to influence without authority and drive cross-team alignment.
  • Adept at
    risk assessment, decision-making, and crisis management
    under pressure.

Project & Program Management

  • Advanced project and initiative management capabilities.
  • Ability to lead multiple high-impact initiatives in parallel while maintaining operational stability.

Preferred / Nice-to-Have

  • Experience implementing SRE practices at enterprise scale.
  • Familiarity with compliance, security, and governance requirements in large organizations.
  • Experience driving cultural transformation toward reliability-first engineering.


  • Hyderabad, Telangana, India Oracle Full time

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX.  Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that...


  • Hyderabad, Telangana, India Oracle Full time

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...


  • Hyderabad, Telangana, India Oracle Full time

    We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions...


  • Hyderabad, Telangana, India Oracle Full time

    We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles , ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions...


  • Hyderabad, Telangana, India Oracle Full time

    DescriptionWe are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement...


  • Hyderabad, Telangana, India Oracle Full time

    Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...


  • Hyderabad, Telangana, India JPMorganChase Full time ₹ 20,00,000 - ₹ 40,00,000 per year

    JOB DESCRIPTIONJoin a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact.As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking division, you will leverage your advanced...


  • Hyderabad, Telangana, India Nam Info Full time

    Dear CandidateHope you are doing well.Please find the job description below. Kindly share resume along with your Current CTC ( Fix+Var) and Expected CTC and the last working day confirmation and send to sandhya.-Role: Principal Site Reliability Engineer (Java Stack Java/Apache/Tomcat/Oracle/Shell Scripting)Location: HyderabadDuration: Full-Time Employment...


  • Hyderabad, Telangana, India GHX Full time

    Site Reliability Engineer (SRE)Position SummaryThe Site Reliability Engineer (SRE) will be a hands-on contributor within the Site Reliability Engineering Center of Excellence (CoE), responsible for building monitoring and observability solutions, troubleshooting production issues, and participating in 24x7 on-call operations.This role focuses on the...


  • Hyderabad, Telangana, India Cubic Corporation Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Business Unit:Cubic Transportation SystemsCompany Details:When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led by our...