Site Reliability Engineer II

2 days ago


Bengaluru, Karnataka, India Zafin Full time

Senior Site Reliability Engineer (SRE II)

Own availability, latency, performance, and efficiency for Zafin's SaaS on Azure. You'll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.

What you'll do

  • SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
  • Error budgeting (policy & tooling):
  • Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
  • Gate changes by budget status (freeze/relax rules) wired into CI/CD.
  • Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
  • Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
  • Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
  • Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
  • AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
  • Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
  • IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
  • CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
  • Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
  • DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
  • Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
  • Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
  • Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
  • Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
  • (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.

Minimum qualifications

  • Bachelor's in CS/Engineering (or equivalent experience).
  • 12+ years in production ops/platform/SRE, including 5+ years on Azure.
  • PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
  • Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
  • IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
  • Mentorship and crisp written/verbal communication.

Preferred (nice to have)

  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
  • Azure Solutions Architect Expert, CKA/CKAD.
  • ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.
  • Multi-tenant SaaS and cost optimization at scale.


  • Bengaluru, Karnataka, India JPMorgan Chase Full time

    Job Category Software Engineering Play a key role in ensuring system reliability at one of the world s most iconic and largest financial institutions As a Site Reliability Engineer II at JPMorgan Chase within the Chief Administrative Office - Global Real Estate Technology you will use technology to solve business problems and leverage software...


  • Bengaluru, Karnataka, India JPMorganChase Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.As a Site Reliability Engineer II at JPMorgan Chase within the Chief Administrative Office - Global Real Estate Technology, you will use technology to solve business problems and leverage software engineering best practices as we strive towards...


  • Bengaluru, Karnataka, India FOSS United Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    All JobsSite Reliability Engineer at ZEISS IndiaSite Reliability EngineerApplyPosted on September 11, 2025ZEISS IndiaKadubeesanahalli, BengaluruFull TImeJob DescriptionZEISS in IndiaZEISS in India is headquartered in Bengaluru and present in the fields of Industrial Quality Solutions, Research Microscopy Solutions, Medical Technology, Vision Care and Sports...


  • Bengaluru, Karnataka, India Akamai Full time

    Job Category Site Reliability Would you like to lead modernization initiatives while building a public cloud platform from scratch Would you like to own critical services in a new public cloud platform Join our IaaS Site Reliability Engineering SRE team We design develop and operate infrastructure and services that power the backbone of our...


  • Bengaluru, Karnataka, India Microsoft Full time

    Microsoft is a company where passionate innovators come to collaborate envision what can be and take their careers further This is a world of more possibilities more innovation more openness and the sky is the limit thinking in a cloud-enabled world Microsofts Azure Data engineering team is leading the transformation of analytics in the world of data...


  • Bengaluru, Karnataka, India Trintech Full time US$ 1,25,000 - US$ 1,75,000 per year

    THE ROLEThe SRE NOC Specialist role supports 24x7 delivery of Hosted and SaaS applications to global Fortune 500 clients at cloud scale. This role will focus on day to day tasks of analyzing and monitoring applications within our environment.WHO YOU AREBachelors Degree in Computer Science, Information Systems, Engineering, or equivalent experience.Excellent...


  • Bengaluru, Karnataka, India NIKE Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Site Reliability Engineer IIIndia Technology CenterWHO YOU'LL WORK WITHYou will be a part of a team of talented Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Reliability Engineering, Live Site...


  • Bengaluru, Karnataka, India CES Full time

    We're looking for a highly skilled Site Reliability Engineer to help us build, manage, and scale modern infrastructure systems for high-availability applications. If you're passionate about automation, cloud platforms, and solving tough operational challenges, we would love to hear from you.Key Skills and Competencies3+ years of extensive experience with...


  • Bengaluru, Karnataka, India American Express Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    You Lead the Way. We've Got Your Back.With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you'll learn and grow as we help you create a career...


  • Bengaluru, Karnataka, India Programming Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Role - Site Reliability Engineering.Location - BengaluruYears of Expereince - 4+ YearsProfessional & Technical Skills:Must To Have Skills: Proficiency in Site Reliability Engineering.Good To Have Skills: Experience with cloud service providers such as AWS, Azure, or Google Cloud.Strong understanding of CI/CD tools and practices.Experience with container...