Cloud Reliability Engineer

1 day ago


Lucknow, Uttar Pradesh, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000
Job Overview

This role focuses on guaranteeing the reliability, performance, and efficiency of our cloud-based platform. The ideal candidate will be responsible for defining and enforcing reliability standards, leading high-impact projects, mentoring engineers, and eliminating operational toil at scale.

Key Responsibilities:
  • SRE Engineering Lead
  • Develop customer-centric Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services.
  • Implement an error budget policy with multi-window alerts, clear runbooks, and paging thresholds.
  • Maintain SLO/EB dashboards using Azure Monitor, Grafana, and Prometheus.
  • Drive roadmap tradeoffs when budgets are at risk and land reliability epics.
  • Lead incidents without drama: own comms, run blameless postmortems, and make corrective actions stick.
  • Engineer reliability in multi-AZ/region patterns, PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
  • Optimize AKS clusters, network, identity, policy, node/pod density, ingress, and mesh configurations.
  • Ensure observability through metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, and OpenTelemetry.
  • IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper).
  • CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, and Key Vault-backed secrets.
  • Capacity & performance: Load testing, right-sizing, autoscaling, and partner with FinOps to reduce spend without hurting SLOs.
  • Define RTO/RPO, test backups/restore, run game days/chaos drills, and validate ASR and multi-region failover.
  • Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, and shift-left checks in CI.
  • Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
  • Candidate must possess deep expertise in PostgreSQL HA/DR, replication, performance tuning, autovacuum strategy, partitioning, backup/restore testing, and connection pooling.
  • Azure core skills include AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability skills involve Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, and SLO design/error-budget operations.
  • Prior experience in incident leadership, blameless postmortems, and SLO/error-budget governance with change gating is required.
  • Candidate should have proven mentorship skills and crisp written/verbal communication abilities.
Requirements:
  • Bachelor's degree in Computer Science or related field.
  • Minimum 12 years of production ops/platform/SRE experience, including 5+ years on Azure.
  • Proven track record of delivering results under pressure and adapting to changing priorities.


  • Lucknow, Uttar Pradesh, India beBeeEngineer Full time ₹ 15,00,000 - ₹ 25,00,000

    We are seeking an exceptional Cloud Engineer to design, build, and validate scalable, resilient cloud-native environments.Key Responsibilities:Architect, implement, and manage secure AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront).Automate infrastructure provisioning using Terraform / CloudFormation and AWS SDKs.Manage containerized workloads...


  • Lucknow, Uttar Pradesh, India beBeeAWS Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Cloud SRE Developer">We are seeking a skilled Cloud SRE Developer with expertise in AWS serverless to join our high-performing team. The ideal candidate will have strong knowledge of AWS architectures, automation, and reliability principles.">The successful candidate will design and maintain resilient, fault-tolerant AWS architectures using automation and...


  • Lucknow, Uttar Pradesh, India beBeeReliability Full time ₹ 18,00,000 - ₹ 24,00,000

    Site Reliability EngineerWe're seeking a highly skilled Site Reliability Engineer to drive system reliability and performance.This role combines technical depth with a proactive mindset, focusing on automation, cloud infrastructure, and observability solutions.The ideal candidate will have experience in designing and implementing automated infrastructure...


  • Lucknow, Uttar Pradesh, India beBeeCloudReliability Full time ₹ 1,50,80,000 - ₹ 2,01,20,000

    Cloud Reliability Expert Wanted">We are looking for a highly skilled Cloud Reliability Engineer to design, build and validate scalable, automated cloud-native environments using AWS. The ideal candidate will combine cloud engineering, DevOps and chaos experimentation to improve reliability, fault tolerance and operational efficiency of critical...


  • Lucknow, Uttar Pradesh, India beBeeELK Full time ₹ 1,75,00,000 - ₹ 2,25,00,000

    Senior Site Reliability Engineer - ELK ExpertWe are seeking a skilled Senior Site Reliability Engineer with expertise in the ELK stack to join our Platform Engineering Practice.This high-impact engineering opportunity focuses on enhancing performance, observability, and operational excellence at scale.You will design, manage, and scale large-scale...

  • Senior Cloud Engineer

    3 weeks ago


    Lucknow, Uttar Pradesh, India CareerXperts Consulting Full time

    We are looking for a Senior Cloud Engineer to architect, implement, and optimize cloud infrastructure that supports mission-critical applications at scale. You will work closely with engineering, DevOps, and security teams to ensure our systems are reliable, secure, and highly available. This role demands deep expertise in cloud platforms, automation, and...

  • Cloud Engineer

    3 days ago


    Lucknow, Uttar Pradesh, India beBeeDevops Full time ₹ 20,00,000 - ₹ 30,00,000

    Cloud Engineer RoleWe are seeking a skilled Cloud Engineer to join our organization. As a key contributor, you will design and develop infrastructure interfaces for complex business applications.Your Key Responsibilities:Collaborate with clients to turn complex ideas into end-to-end solutions that transform their businesses.Lead the implementation of...


  • Lucknow, Uttar Pradesh, India beBeeCloud Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Job Title: Scalable Cloud Solutions EngineerThis is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.We're looking for an experienced Senior Site Reliability Engineer to join our Platform Engineering Practice. In this role, you'll design, manage, and scale large-scale cloud infrastructure...


  • Lucknow, Uttar Pradesh, India beBeeSpecialist Full time ₹ 20,00,000 - ₹ 30,00,000

    ">Cloud AI/ML Reliability SpecialistWe are seeking a skilled Cloud AI/ML Reliability Specialist to ensure the reliability and scalability of our cloud-based services.">Key Responsibilities:">Design, develop, and maintain infrastructure using Terraform.Develop and implement monitoring and alerting systems using Azure Log Analytics.Collaborate with...


  • Lucknow, Uttar Pradesh, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 3,00,00,000

    Enhance Operational Excellence as a Site Reliability EngineerWe are seeking an exceptional professional to fill the role of Site Reliability Engineer and contribute to the success of our organization.The ideal candidate will be responsible for developing and implementing strategies that promote operational efficiency, service quality, and reliability. They...