
Site Reliability Leadership
16 hours ago
Reliability Engineering Manager
">- ">
- As a reliability engineer manager, you will be responsible for ensuring the availability, latency, and performance of our SaaS platform on Azure. This involves defining and enforcing reliability standards, leading high-impact projects, mentoring engineers, and eliminating toil at scale.">
Key Responsibilities:
">- ">
- Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services and publish, review quarterly, and align teams to them.">
- Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.">
- Gate changes by budget status (freeze/relax rules) wired into CI/CD.">
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.">
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.">
- Lead SEV1/SEV2 incidents without drama: own comms, run blameless postmortems, and make corrective actions stick.">
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.">
- Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.">
- Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.">
- Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.">
- Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.">
- Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.">
- Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.">
- Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.">
- Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.">
- Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.">
- Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.">
Requirements:
">- ">
- Bachelor's in CS/Engineering (or equivalent experience).">
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.">
- Deep operational expertise in PostgreSQL, including HA/DR, logical/physical replication, performance tuning.">
- AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.">
- Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.">
- IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.">
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.">
- Mentorship and crisp written/verbal communication.">
Nice to Have:
">- ">
- Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.">
- Azure Solutions Architect Expert, CKA/CKAD.">
- ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).">
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.">
- OpenTelemetry, eBPF tooling, or service mesh.">
- Multi-tenant SaaS and cost optimization at scale.">
-
Site Reliability Engineering Director
3 days ago
Kollam, Kerala, India beBeeReliability Full time ₹ 22,50,000 - ₹ 25,50,000Optimize Reliability and Drive Excellence as a SRE ManagerAre you ready to lead the reliability engineering function and drive exceptional performance? As a Site Reliability Engineering Manager, you will be responsible for ensuring infrastructure resiliency and optimal operational performance. This role combines technical leadership with team mentorship and...
-
Kollam, Kerala, India beBeeSre Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Site Reliability Engineering LeaderWe are seeking a highly skilled Site Reliability Engineering leader to drive our SRE strategy and promote an 'Automate-first' culture in operating services.The ideal candidate will have 12+ years of experience and a comprehensive understanding of SRE principles and modern observability tooling.Key Responsibilities:SRE...
-
Lead Site Reliability Engineer
7 days ago
Kollam, Kerala, India Landmark Group Full timeCOMPANY- LANDMARK GROUPJob Title: SRE Lead (Engineering & Reliability)Experience: 8-12 yearsJob Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead tooversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,you will play a pivotal role in establishing and implementing SRE practices,...
-
Site Reliability Engineer
24 hours ago
Kollam, Kerala, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Immerse yourself in a dynamic and innovative environment where you will have the opportunity to work with cutting-edge technologies and collaborate with talented professionals.As a VP – Site Reliability Engineering, you will be responsible for shaping the SRE function within our company and contributing to the development of the Group SRE team. The ideal...
-
Site Reliability Engineer
1 week ago
Kollam, Kerala, India Xebia Full timeWe are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency...
-
Senior site reliability engineer
7 days ago
Kollam, Kerala, India Cimpress Full timeSenior Site Reliability EngineerWho We Are:Cimpress Technology develops cutting-edge, best-in-world software that our mass customization businesses use to create personalized products for over 17 million global customers. Our Mass Customization Platform consists of modular, multi-tenant services. Our businesses can choose the solutions that work for them, or...
-
Site Reliability Engineer
2 days ago
Kollam, Kerala, India GSPANN Technologies, Inc Full timeAbout the Company :Headquartered in California, U.S.A., GSPANN provides consulting and IT services to global clients. We help clients transform how they deliver business value by helping them optimize their IT capabilities, practices, and operations with our experience in retail, high-technology, and manufacturing. With five global delivery centers and 1900+...
-
VP – Site Reliability Engineering
1 day ago
Kollam, Kerala, India Natobotics Full timeWe're on an exciting journey with our client and we want you to join us. With our client, you will be exposed to the latest technologies and work with some of the brightest minds in the industry.Our client is leading Banking company so you will be playing a key role as a VP – Site Reliability Engineering (SRE), who can assist with the below:Roles &...
-
Site Supervisor
5 days ago
Kollam, Kerala, India The Animal-i Full time ₹ 2,50,000 - ₹ 3,00,000 per yearAbout the RoleWe are looking for a skilled and responsible Site Supervisor to oversee aluminium fabrication and installation works at client sites. The role requires supervising site activities, coordinating with the project team, preparing material estimates, and ensuring quality and timely project execution. A working knowledge of AutoCAD is essential for...
-
Reliable System Architect
7 days ago
Kollam, Kerala, India beBeesystem Full time ₹ 14,24,100 - ₹ 25,17,700About UsWe are seeking a Site Reliability Engineer to ensure the reliability, scalability, and performance of our critical systems.