Site Reliability Professional

1 week ago


Aurangabad, Maharashtra, India beBeeSiteReliability Full time ₹ 1,20,00,000 - ₹ 1,50,00,000

Job Description:

We are seeking a highly skilled Site Reliability Engineer II to join our team. The successful candidate will be responsible for ensuring the availability, latency, performance, and efficiency of our SaaS on Azure.

The ideal candidate will have a strong background in production ops, platform, and SRE, with at least 5 years of experience on Azure. They will also have deep operational expertise in PostgreSQL, including HA/DR, logical/physical replication, performance tuning, autovacuum strategy, partitioning, backup/restore testing, and connection pooling.

The successful candidate will work closely with engineering and product teams to define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale.

Key Responsibilities:

  • Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services.
  • Implement error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
  • Gates changes by budget status (freeze/relax rules) wired into CI/CD.
  • Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
  • Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
  • Lead SEV1/SEV2 incidents without drama: own comms, run blameless postmortems, and make corrective actions stick.
  • Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
  • Azure AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
  • Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
  • IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
  • CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
  • Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
  • Disaster recovery you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
  • Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
  • Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
  • Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
  • Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
  • (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.

Requirements:

  • Bachelor's in CS/Engineering (or equivalent experience).
  • 12+ years in production ops/platform/SRE, including 5+ years on Azure.
  • Deep operational expertise in PostgreSQL.
  • Azure core knowledge: AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability knowledge: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
  • IaC/automation knowledge: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
  • Mentorship and crisp written/verbal communication.

Preferred Qualifications:

  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
  • Azure Solutions Architect Expert, CKA/CKAD.
  • ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.
  • Multi-tenant SaaS and cost optimization at scale.


  • Aurangabad, Maharashtra, India beBeeSystemReliability Full time ₹ 18,00,000 - ₹ 25,00,000

    Role Summary:We are seeking a talented and skilled Site Reliability Engineer to join our team. As a key member of our platform engineering team, you will be responsible for designing, implementing, and maintaining scalable monitoring, alerting, and logging solutions to ensure the availability and performance of backend services.You will also support the...


  • Aurangabad, Maharashtra, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Job SummaryWe are seeking a seasoned Vice President of Site Reliability Engineering to join our team. This is an exciting opportunity for a highly motivated and experienced professional to drive the development and implementation of SRE strategy.Key Responsibilities:Define, drive, and implement the SRE strategy to promote an 'Automate-first' culture in...


  • Aurangabad, Maharashtra, India Rangam Full time

    Company : Product Base ClientJob Title: Site Reliability EngineerLocation: Bangalore,KT,India,Shift: 01:00 PM - 10:00 PMBudged: Best in the MarketKEY RESPONSIBILITIES:Drive high levels of stability and availability of services driving Site Reliability Engineering as a practice across IPE.Grow partnership with Product Engineering owners, drive initiatives...


  • Aurangabad, Maharashtra, India beBeeTechnical Full time ₹ 90,00,000 - ₹ 1,20,00,000

    Job Title:High-Performing Technical ManagerThe High-Performing Technical Manager will lead a remote team, ensuring operational excellence and fostering a culture of innovation.Key Responsibilities:Provide strategic leadership and management to a remote team of Site Reliability Engineers, ensuring alignment with organizational priorities and goals.Oversee...


  • Aurangabad, Maharashtra, India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000

    At ANSR, we are cultivating a space where ideas flourish and careers thrive. Our workplace is a vital extension of the workforce and important to the next phase in the company's technology journey to support millions of customers' lives every day.This role will be an individual contributor responsible for building and fine-tuning the platform components for...


  • Aurangabad, Maharashtra, India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000

    Site Reliability Engineer Role SummaryWe are seeking an experienced Site Reliability Engineer to design and implement scalable, efficient, and reliable infrastructure. This role requires strong problem-solving skills, the ability to work independently and collaboratively, and excellent communication and interpersonal skills.Key Responsibilities:Develop and...


  • Aurangabad, Maharashtra, India beBeeSenior Full time ₹ 20,00,000 - ₹ 25,00,000

    We're looking for a highly skilled Site Reliability Engineer to join our team.The ideal candidate will have a passion for software engineering and infrastructure, with experience in designing and architecting solutions that meet the needs of our business. They should be able to find creative ways to optimize existing solutions and improve agility in managing...


  • Aurangabad, Maharashtra, India beBeeEngineering Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Site Reliability Engineer Role SummaryTalent500 is seeking a highly skilled Site Reliability Engineer to play a critical role in ensuring the stability, scalability, and operational excellence of accounting platforms.Key Responsibilities:Ensure accounting platforms meet defined SLAs, SLOs, and SLIs for performance, reliability, and uptime.Build automation...


  • Aurangabad, Maharashtra, India beBeeMaintenance Full time ₹ 18,00,000 - ₹ 20,00,000

    Maintenance SpecialistThis role involves overseeing the upkeep and maintenance of various utilities, including boilers, refrigeration systems, air compressors, and water treatment plants. Additionally, you will be responsible for maintaining manufacturing plants with rotating equipment such as conveyors, crushers, and dryers, along with electrical HT/LT...


  • Aurangabad, Maharashtra, India beBeeResiliency Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our team. As a key member of our organization, you will be responsible for ensuring the reliability and resilience of our systems.The ideal candidate will have a strong background in IT infrastructure management, with experience in designing and maintaining fault-tolerant...