Cloud Reliability Engineer

2 weeks ago


Rajkot, Gujarat, India beBeeSenior Full time ₹ 1,50,00,000 - ₹ 2,25,00,000

Job Summary

We are seeking a seasoned Senior Reliability Engineer to drive the availability, latency, and efficiency of our cloud-based infrastructure. As a key member of our team, you will be responsible for defining and enforcing reliability standards, leading high-impact projects, mentoring engineers, and eliminating repetitive tasks at scale.

Key Responsibilities

  • SLOs & SLIs: Develop customer-centric SLOs/SLIs for critical services and collaborate with teams to align them.
  • Error Budgeting: Implement an error-budget policy with multi-window, multi-burn-rate alerts; maintain clear runbooks and paging thresholds. Gate changes based on budget status (freeze/relax rules) integrated into CI/CD.
  • SLO/EB Dashboards: Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Conduct weekly SLO reviews with engineering/product.
  • Incident Leadership: Lead critical incidents without drama. Own communications, conduct blameless postmortems, and ensure corrective actions are implemented.
  • Reliability Engineering: Design reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
  • AKS at Scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
  • Observability: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
  • IaC & Policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
  • CI/CD Reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
  • Capacity & Performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without compromising SLOs.
  • Disaster Recovery: Define RTO/RPO, test backups/restore, conduct game days/chaos drills, validate ASR and multi-region failover.
  • Security: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
  • Toil Reduction: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
  • Customer Escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
  • Documentation: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
  • Streaming/ETL Reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.

Requirements

  • Bachelor's in Computer Science or Engineering (or equivalent experience).
  • 12+ years in production ops/platform/SRE, including 5+ years on Azure.
  • PostgreSQL expertise: Deep understanding incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
  • Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
  • IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
  • Mentorship and crisp written/verbal communication.

PREFERRED QUALIFICATIONS

  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
  • Azure Solutions Architect Expert, CKA/CKAD.
  • ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.
  • Multi-tenant SaaS and cost optimization at scale.

About This Role

This is an excellent opportunity to take your skills to the next level and join our dynamic team of experts. If you're passionate about driving reliability and efficiency in cloud-based systems, we encourage you to apply.



  • Rajkot, Gujarat, India beBeeReliabilityEngineer Full time ₹ 20,00,000 - ₹ 30,00,000

    We are looking for a highly skilled engineer to join our team and help us design, build, and maintain high-performance, scalable, and reliable services.Our mission is to improve developers' experience by giving them the tools to manage the entire software lifecycle and be self-sufficient.To achieve this goal, we are building our own internal PaaS using the...


  • Rajkot, Gujarat, India beBeeEngineering Full time ₹ 1,50,00,000 - ₹ 2,01,00,000

    Job Overview:This leadership role oversees the reliability engineering function, focusing on infrastructure resilience and operational efficiency. The ideal candidate will have expertise in leading technical teams, mentoring staff, and collaborating with cross-functional groups.Key Responsibilities:Evaluate and implement organizational reliability...


  • Rajkot, Gujarat, India beBeeCloud Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Job OverviewAs a Site Reliability Engineer, you will play a pivotal role in ensuring the reliability and performance of our systems. This is an exciting opportunity to leverage your technical expertise and proactive mindset to drive system excellence.The ideal candidate will possess a deep understanding of automation, cloud infrastructure, and observability...


  • Rajkot, Gujarat, India beBeeCloudOperations Full time ₹ 20,00,000 - ₹ 25,00,000

    Job Title: Cloud Operations SpecialistWe are seeking an experienced and skilled Cloud Engineer to join our team. The ideal candidate will have a strong background in cloud operations, Site Reliability Engineering (SRE), and Kubernetes.The successful candidate will be responsible for ensuring the availability, reliability, and performance of our cloud...


  • Rajkot, Gujarat, India beBeeCloudEngineer Full time US$ 1,50,000 - US$ 1,70,000

    Senior Cloud Engineer Role Summary: The Senior Cloud Engineer will develop, maintain and optimize cloud infrastructure that supports the Warehouse Management System (WMS) in production.Primary Responsibilities:Deploy, configure and maintain Windows Server virtual machines within AWS.Provision, manage and optimize performance for SQL Server and PostgreSQL...


  • Rajkot, Gujarat, India beBeeReliability Full time ₹ 22,38,381

    We are seeking a skilled Site Reliability Engineer to ensure the stability, scalability, and operational excellence of Accounting platforms.This role is focused on delivering reliable financial applications and data services, ensuring they meet demanding requirements of accuracy, compliance, and availability that support business operations.The successful...


  • Rajkot, Gujarat, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000

    GCP Analyst PositionDuration: Long-term ContractOur organization is seeking a skilled GCP Analyst to fill this key role.Key Responsibilities:Develop and maintain large-scale data acquisition pipelines using Google Cloud services.Construct complex datasets based on business requirements and leverage various tools in the Google Cloud Ecosystem, including...


  • Rajkot, Gujarat, India beBeeElectrolyser Full time ₹ 1,62,96,000 - ₹ 2,18,28,000

    Job SummaryWe are seeking a seasoned reliability engineering leader to join our team. This role will be responsible for driving the development and execution of a comprehensive product reliability strategy for electrolysers.Main ResponsibilitiesCollaborate closely with cross-functional teams to integrate reliability into early design stages through...


  • Rajkot, Gujarat, India beBeeCloudEngineer Full time US$ 1,60,000 - US$ 1,90,000

    We're on the hunt for skilled engineers to build scalable systems that thrive in fast-paced environments.Our team values innovative solutions and collaboration. The Observability team needs talented engineers with expertise in cloud-native design, legacy system maintenance, and SRE best practices.Key ResponsibilitiesAs a DevOps Engineer, you'll help our...

  • Cloud Data Engineer

    2 weeks ago


    Rajkot, Gujarat, India beBeeData Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job Title: Cloud Data EngineerJob Summary:We are seeking a seasoned Cloud Data Engineer to join our team. This individual will design, build, and optimize scalable data pipelines and architectures on Google Cloud Platform (GCP). The ideal candidate is hands-on with Google Cloud Storage (GCS), BigQuery (BQ), Apache Airflow, and Python.