Site Reliability Engineer II

4 days ago


Erode, Tamil Nadu, India Zafin Full time

Senior Site Reliability Engineer (SRE II)

Own availability, latency, performance, and efficiency for Zafin's SaaS on Azure. You'll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.

What you'll do

  • SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
  • Error budgeting (policy & tooling):
  • Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
  • Gate changes by budget status (freeze/relax rules) wired into CI/CD.
  • Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
  • Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
  • Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
  • Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
  • AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
  • Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
  • IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
  • CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
  • Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
  • DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
  • Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
  • Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
  • Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
  • Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
  • (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.

Minimum qualifications

  • Bachelor's in CS/Engineering (or equivalent experience).
  • 12+ years in production ops/platform/SRE, including 5+ years on Azure.
  • PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
  • Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
  • IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
  • Mentorship and crisp written/verbal communication.

Preferred (nice to have)

  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
  • Azure Solutions Architect Expert, CKA/CKAD.
  • ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.
  • Multi-tenant SaaS and cost optimization at scale.


  • Erode, Tamil Nadu, India beBeeLeadership Full time ₹ 10,00,000 - ₹ 12,00,000

    Job Opportunity:We are seeking a highly skilled Technical Manager to lead our Site Reliability Engineering (SRE) team, ensuring operational excellence and fostering a collaborative work environment.This leadership role will oversee the day-to-day operations of the SRE team, providing technical guidance, mentorship, and strategic alignment with business...


  • Erode, Tamil Nadu, India beBeeReliability Full time ₹ 32,00,000 - ₹ 40,00,000

    Reliability EngineerWe are seeking a seasoned Reliability Engineer to join our team and play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This is a unique opportunity to work on highly reliable financial applications and data services that meet demanding requirements of accuracy,...


  • Erode, Tamil Nadu, India Cimpress Full time

    Senior Site Reliability Engineer Who We Are: Cimpress Technology develops cutting-edge, best-in-world software that our mass customization businesses use to create personalized products for over 17 million global customers. Our Mass Customization Platform consists of modular, multi-tenant services. Our businesses can choose the solutions that work for...


  • Erode, Tamil Nadu, India Birlasoft Full time

    SRE Administrator :Experience : 7 to 10 yearsResponsibilities:Be primarily responsible for providing production, operations support and application administration to business and web applications, 3rd party applications and related ecosystems. The application environment though mixed, is primarily based on Microsoft technologies. Among the environments which...


  • Erode, Tamil Nadu, India beBeeSRE Full time ₹ 19,80,000 - ₹ 26,64,000

    Job SummaryWe are seeking a skilled Site Reliability Engineer to join our team. As an SRE, you will be responsible for monitoring and interpreting Grafana dashboards to identify potential failures and manage incident communication.This role involves proactively detecting and resolving issues before they impact our users. You will work closely with...


  • Erode, Tamil Nadu, India beBeeEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job Title: System Reliability Engineering LeadWe are seeking an experienced and dynamic professional to oversee the reliability, scalability, and performance of our critical systems. This role combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.Key Responsibilities:Reliability &...


  • Erode, Tamil Nadu, India Whitefield Careers Full time

    Experience : 2-4 years of relevant experienceKey Skills Required:● Extensive knowledge of Linux and Windows systems.● Proficient in setting up alerts, dashboards, and analyzing metrics/logs for system performance and reliability.● Familiarity with system architecture and configuration management tools.● Understanding of TCP/IP, HTTP, DNS, and Load...


  • Erode, Tamil Nadu, India beBeeSenior Full time ₹ 35,00,000 - ₹ 45,00,000

    Job TitleA Senior Site Reliability Engineer (SRE II) is sought after to lead the availability, latency, performance, and efficiency of our SaaS on Azure.The ideal candidate will define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. They will report directly to the Director of SRE.Key...

  • Reliability Expert

    2 weeks ago


    Erode, Tamil Nadu, India beBeeTechnicalLeader Full time ₹ 9,75,000 - ₹ 10,25,000

    As a senior site reliability engineer, you will be responsible for ensuring the uptime and performance of our systems.We are looking for an individual with 10+ years of experience in SRE or DevOps roles, with deep expertise in Kubernetes, Networking, and Relational Databases.Strong scripting skills, such as Python and Bash, are required for tooling and...

  • Site Engineer

    7 days ago


    Erode, Tamil Nadu, India ERO CONSTRUCTIONS Full time ₹ 4,20,000 per year

    Job Title: Site Engineer – CivilLocation: Erode & Ariyalur, Tamil NaduExperience Required: 4 – 6 YearsAbout Us:Erocrete, a division of ERO Construction, is committed to delivering innovative and high-quality construction solutions. We are seeking dedicated Site Engineers – Civil to oversee our projects in Erode and Ariyalur locations.Key...