
SRE Lead
5 days ago
We are looking for an experienced SRE Lead to drive reliability, scalability, and performance across our systems and services. The ideal candidate will have strong expertise in cloud infrastructure, automation, monitoring, and incident management, along with proven leadership skills to guide the SRE team.
Lead the SRE team to ensure high availability, reliability, and performance of production systems.
Design, implement, and improve monitoring, alerting, and incident response processes.
Collaborate with engineering teams to optimize system design for reliability and scalability.
Implement automation to reduce manual tasks and improve operational efficiency.
Drive root cause analysis and post-incident reviews to prevent recurrence.
Manage on-call rotations and ensure timely resolution of incidents.
Define and monitor SLAs, SLOs, and SLIs for critical services.
Foster a culture of continuous improvement and operational excellence.
null
5+ years of Experience