Site Reliability Champion

2 weeks ago


Tirupati, Andhra Pradesh, India beBeeReliability Full time ₹ 1,40,80,000 - ₹ 2,21,20,000
Job Description

Senior Site Reliability Engineer II Job Summary We are seeking a highly skilled Senior Site Reliability Engineer II to join our team. As a key member of our SRE team, you will be responsible for ensuring the reliability and performance of our SaaS platform on Azure.

Key Responsibilities
  • Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services Publish, review quarterly, and align teams to them
  • Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds
  • Gate changes by budget status (freeze/relax rules) wired into CI/CD
  • Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
  • Drive roadmap tradeoffs when budgets are at risk; land reliability epics
  • Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick
  • Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback
  • AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional
  • Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise
  • IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes
  • CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets
  • Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs
  • DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover
  • Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI
  • Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams
  • Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority
  • Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable
  • (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows
Required Skills and Qualifications
  • Bachelor's in CS/Engineering (or equivalent experience)
  • 12+ years in production ops/platform/SRE, including 5+ years on Azure
  • PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer)
  • Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs
  • Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations
  • IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating
  • Mentorship and crisp written/verbal communication
Nice to Haves
  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns
  • Azure Solutions Architect Expert, CKA/CKAD
  • ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie)
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity
  • OpenTelemetry, eBPF tooling, or service mesh
  • Multi-tenant SaaS and cost optimization at scale


  • Tirupati, Andhra Pradesh, India beBeeReliability Full time ₹ 18,40,000 - ₹ 26,40,000

    Job Title: Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team.The ideal candidate will have expertise in ensuring the reliability, scalability, and performance of our systems. This includes identifying potential issues early, implementing preventive measures, and boosting system resilience.This role requires a strong...


  • Tirupati, Andhra Pradesh, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Job Title: Reliability Engineering LeaderA pivotal role in the reliability engineering function, ensuring infrastructure robustness and optimal operational efficiency.The Reliability Engineering Manager will spearhead a team of Site Reliability Engineers, focusing on establishing and implementing organizational reliability strategies, aligning SLAs, SLOs,...


  • Tirupati, Andhra Pradesh, India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...


  • Tirupati, Andhra Pradesh, India beBeeSiteReliability Full time ₹ 1,80,00,000 - ₹ 2,20,00,000

    Site Reliability Engineer Job OpportunityThe ideal candidate will play a pivotal role in ensuring the reliability and performance of our applications, providing technical expertise to drive business growth. The successful Site Reliability Engineer will design, develop, and support various tools, services, and applications to maintain a reliable site...


  • Tirupati, Andhra Pradesh, India beBeeSiteReliabilityEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job OverviewWe are seeking an experienced Principal Engineer, Site Reliability to join our team. This individual will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.The successful candidate will lead the operational health of these platforms, ensuring the delivery of highly reliable...


  • Tirupati, Andhra Pradesh, India beBeeEngineering Full time ₹ 18,00,000 - ₹ 22,00,000

    Our organization is seeking a Site Reliability Engineer with expertise in SRE. The ideal candidate will have a strong foundation in DevOps skills such as CI/CD, monitoring, automation, and infrastructure as code.Key Qualifications:Exceptional Troubleshooting SkillsAdvanced DevOps ExpertisePersistence in Complex IssuesIndependence and Self-InitiativeEffective...


  • Tirupati, Andhra Pradesh, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    System Reliability EngineerWe are looking for a highly skilled System Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing and implementing reliable systems that meet the needs of our business.Your primary focus will be on ensuring the high availability, scalability, and performance of our...


  • Tirupati, Andhra Pradesh, India beBeeNetwork Full time ₹ 60,00,000 - ₹ 1,20,00,000

    Job OverviewWe are seeking a skilled Network Engineer with expertise in firewall management, cloud networking, and automation to join our team. As a Site Reliability & Network Engineer, you will play a critical role in designing, deploying, and monitoring network infrastructure, ensuring regulatory compliance and security.


  • Tirupati, Andhra Pradesh, India beBeeSystemReliability Full time ₹ 15,00,000 - ₹ 20,00,000

    Job Title: System Reliability EngineerWe are seeking a highly skilled System Reliability Engineer to join our team. The successful candidate will be responsible for building and maintaining the platform components for observability.This role will involve working closely with the Lead engineer, performance team, data ingestion, platform DevOps, and data...


  • Tirupati, Andhra Pradesh, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the smooth operation of our digital systems, identifying potential issues early, and implementing preventive measures.Your Key Responsibilities:Engineer reliability: Implement proactive measures to prevent...