GCP Site Reliability Engineer, Staff

5 days ago

Hyderabad, Telangana, India Warner Bros. Discovery Full time ₹ 10,00,000 - ₹ 25,00,000 per year

Welcome to Warner Bros. Discovery… the stuff dreams are made of.

Who We Are…

When we say, "the stuff dreams are made of," we're not just referring to the world of wizards, dragons and superheroes, or even to the wonders of Planet Earth. Behind WBD's vast portfolio of iconic content and beloved brands, are the
storytellers
bringing our characters to life, the
creators
bringing them to your living rooms and the
dreamers
creating what's next…

From brilliant creatives, to technology trailblazers, across the globe, WBD offers career defining opportunities, thoughtfully curated benefits, and the tools to explore and grow into your best selves. Here you are supported, here you are celebrated, here you can thrive.

Your New Role:

We are

seeking

a highly skilled

Lead

Google Cloud Platform

(GCP)

Site Reliability Engineer (SRE) to join the Global Infrastructure Cloud Technologies (GICT)

team

to ensure the reliability, availability, scalability, and security of our cloud infrastructure and services.

The ideal candidate will bring

expertise

GCP

, automation, monitoring, and incident management to drive operational excellence.

The

role serves as a technical leader across our Hyderabad Cloud Team, supporting hundreds of applications,

websites

and services in the fleet of

Warner Bros Discovery (

WBD

)

cloud accounts.

The selected individual

will help craft management and governance strategies, and work to unify processes with other cloud providers. As a team player, the

GCP Lead

SRE will collaborate with other SRE

Leads

, the rest of the cloud engineering team, software developers and management to build and manage highly resilient and performant infrastructure.

This individual will have a strong background in Linux and Windows Systems Engineering.

Proficiency

in Terraform and related Infrastructure-as-Code

(

IaC

)

required

. Experience with the software development lifecycle

, be

fluent in distributed computing techniques and technologies,

and

demonstrated

experience managing enterprise scale infrastructure and tooling.

Direct, hands-on experience writing software

ideal

This position reports to the Sr

Manager of Cloud Engineering.

Your Role Accountabilities:

Key Responsibilities

Primarily accountable for managing GCP environments
Identify

optimize

and

eliminate

performance bottlenecks and proactively

remediating

security concerns through monitoring, profiling, and tuning.

Establish and improve SLOs, SLIs, and error budgets to drive system reliability.
Collaborate with stakeholders, including application developers, to improve application observability and

optimize

performance.

Lead and mentor a team of engineers working to reduce toil across the total team load, and

to implement

security features, roles, user access and privileges according to best practices.

Proactively

identify

, design, and implement

process

and architectural

improvements

Stay informed on the latest features and best practices across the

GCP

Public Cloud and the WBD

GCP

environment.

Work with

peer

group of complementary public cloud leads (

Azure

AWS

) to

facilitate

consistency across WBD management of resources wherever possible.

Methodology

Automate deployment, monitoring, and self-healing capabilities to improve operational efficiency.
Develop and manage infrastructure using Terraform

and

other

IaC

tools.

Drive incident response efforts, conduct root cause analyses (RCA), and implement preventative measures to minimize downtime.
Build and enhance monitoring, alerting, and observability systems to proactively resolve incidents before they

impact

users. Evangelize telemetry and metrics-driven application development.

Improve on-call processes and reduce toil by automating repetitive tasks.
Contribute to the software development of cloud management tooling and support applications.
Develop detailed technical documentation, including runbooks, troubleshooting guides, and system diagrams.

Continuous Improvement

Work with stakeholders to ensure systems meet security baselines, best practices, compliance requirements and resiliency standards.
Implement effective backup strategies and conduct regular disaster recovery testing.
Implement robust access controls, secrets management, and security monitoring solutions.
Collaborate with security teams to manage vulnerabilities and respond to threa

Engage with our FinOps/

CostOps

team to

optimize

cloud costs by implementing efficient resource

utilization

and right-sizing strategies.

Work closely with development, infrastructure, and security teams to drive best practices and improvements.
Mentor junior engineers and contribute to a culture of continuous learning and improvement.
Participate in architectural discussions and provide guidance on reliability and scalability considerations.

Qualifications & Experience

years of prior experience in

a Site

Reliability Engineering, DevOps, Cloud

Infrastructure

or related fields.

Expert in Google Cloud Platform.
Strong experience in Linux/Unix administration, networking, and distributed systems.
Fluency in two or more programming languages (Python, Golang,

Javascript

, PowerShell, etc.)

Extensive hands-on experience in container orchestration technologies, such as

GKE

, Kubernetes, Docker.

Deep knowledge of monitoring,

logging

and observability tools (Prometheus, Grafana, ELK, Splunk, etc.).

Hands-on experience with Infrastructure-as-Code (

IaC

) using Terraform

and

Google Cloud Deployment Manager

(

GDM

)

templates.

Strong background in CI/CD pipelines,

GitOps

, and infrastructure automation (Terraform,

Helm,

Ansible

or Chef).

Soft Skills

Strong problem-solving, troubleshooting, and debugging skills.
Excellent written and verbal communication and collaboration abilities.
English language fluency

required

Ability to handle multiple assignments concurrently.
Passion for automation, reliability, and continuous improvement

Move quickly and intelligently - seeing technical debt as your nemesis

Ability to solve problems independently but knows when to request

assistance

Not Required but preferred experience

Experience with other cloud providers such as AWS, Azure,

Oracle

etc.

Knowledge of and passion for media, entertainment, and technology industries (including key players, growth trends and drivers, new media models, industry structure, etc.)
Familiarity with streaming and

GCP Site Reliability Engineer, Staff

GCP Site Reliability Engineer

AWS Site Reliability Engineer

Site Reliability Engineer/SRE

Senior Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer II

Americas

Europe

Asia / Oceania

Africa

GCP Site Reliability Engineer, Staff