
Site Reliability Leader
3 days ago
Our ideal candidate owns the reliability of our cloud-based SaaS solution on Azure.
- Define customer-centric Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
- Implement error budgeting policies with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
- Gate changes by budget status (freeze/relax rules) wired into Continuous Integration/Continuous Deployment (CI/CD).
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Lead SEV1/SEV2 incidents without drama: Own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, Horizontal Pod Autoscaler/Virtual Machine Scaling/Cluster Auto Scaling, resilient rollout/rollback.
- Harden Kubernetes clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Ensure observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- Implement Infrastructure as Code (IaC) & automation: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/Open Policy Agent). No snowflakes.
- Guarantee CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Maximize capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
- Define Disaster Recovery you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Ensure security by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Handle customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
-
Site Reliability Specialist
2 weeks ago
Vellore, Tamil Nadu, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job Summary:We are seeking a highly skilled Site Reliability Specialist to join our team in an exciting opportunity.The ideal candidate will have 7-12 years of experience in technical support or engineering, preferably in AI/ML/GenAI environments.Proven expertise in GenAI models (e.g., GPT, Claude, PaLM2, Llama2) and frameworks (e.g., RAG, Agents,...
-
High-Performing Site Reliability Engineer
1 week ago
Vellore, Tamil Nadu, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Site Reliability Engineering LeadServing as a key technical authority, you will oversee the reliability, scalability, and performance of our critical systems.This role combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems. About this RoleReliability & Performance:Maintain high availability and...
-
Site Excellence Leader
2 weeks ago
Vellore, Tamil Nadu, India beBeeExcellence Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Opportunity: Site Excellence LeaderElevate site performance and embed a culture of innovation as our ideal candidate leads transformation and drives improvement initiatives.About the RoleThis is a chance to shape how we work, think, and grow. As our Site Excellence Leader, you will be part of a team that values ingenuity, collaboration, and principled...
-
Vellore, Tamil Nadu, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Site Reliability Engineering Role OverviewThe primary objective of this role is to design and architect solutions that ensure the reliability, scalability, and stability of software platforms. To achieve this, you will collaborate with engineering teams throughout the development lifecycle, leveraging your expertise in site reliability engineering best...
-
Site Reliability Engineering Opportunity
3 days ago
Vellore, Tamil Nadu, India beBeeEngineering Full time ₹ 1,80,00,000 - ₹ 2,40,00,000**Job Description:**As a site reliability engineer, you will play a crucial role in ensuring the digital backbone runs seamlessly for millions of customers.**Key Responsibilities:Engineer Reliability: Identify potential system issues early and implement preventive measures to minimize downtime and maximize uptime.Automate for Speed: Build tools, pipelines,...
-
Site Reliability Engineer
2 weeks ago
Vellore, Tamil Nadu, India Xebia Full timeWe are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency...
-
Reliability Operations Specialist
3 days ago
Vellore, Tamil Nadu, India beBeeDevops Full time US$ 90,000 - US$ 1,25,000Job OverviewWe are seeking an experienced Reliability Operations Specialist to join our team. This role involves designing, implementing, and maintaining scalable monitoring, alerting, and logging solutions to ensure the availability and performance of backend services.In this position, you will work closely with development teams to design and support...
-
Reliable Systems Architect Position
2 days ago
Vellore, Tamil Nadu, India beBeeSRE Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Senior System Reliability Expert OpportunityWe are seeking an experienced Senior Site Reliability Engineer to ensure the reliability and performance of our systems.
-
Reliable Finance Systems Professional
2 days ago
Vellore, Tamil Nadu, India beBeeSite Full time ₹ 11,00,000 - ₹ 15,40,000Job Title: Site Reliability EngineerThis role focuses on delivering highly reliable financial applications and data services that meet the demanding requirements of accuracy, compliance, and availability supporting business operations.
-
Highly Skilled System Reliability Specialist
2 days ago
Vellore, Tamil Nadu, India beBeeSiteReliabilityEngineer Full time ₹ 27,00,000 - ₹ 36,00,000Job SummaryWe are seeking a seasoned Site Reliability Engineer to join our team. The successful candidate will be responsible for designing, implementing, and maintaining the reliability and scalability of our systems.