Remote Site Reliability Engineer(DevOps)

4 days ago


Trivandrum, India Zafin Full time
Senior Site Reliability Engineer (SRE II)
Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling):
~ Run the error-budget policy with multi-window, multi-burn-rate alerts; Run weekly SLO reviews with engineering/product.
~ Drive roadmap tradeoffs when budgets are at risk; Engineer reliability in: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
~ Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
~ Capacity & performance: Load testing, right-sizing, autoscaling; Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
~ Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
~ Be the technical owner on calls; If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.

Bachelor’s in CS/Engineering (or equivalent experience).
~PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server .
~ Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
~ SLO design and error-budget operations.
~ PowerShell and Python; Pipelines in Azure DevOps or GitHub Actions.
~ Azure Solutions Architect Expert , CKA/CKAD.
ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.

  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do - SLIs/SLOs & contracts: Define customer-centric...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II)Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll do- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll doSLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...


  • Thiruvananthapuram / Trivandrum, India Reflections Info Systems Full time

    Job Description As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability...


  • Trivandrum, India Zafin Full time

    Senior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric...

  • SRE Devops Engineer

    7 days ago


    Thiruvananthapuram / Trivandrum, Chennai, India Kaizen SRA Technologies Private Limited Full time

    Job Description Description We are seeking an experienced SRE DevOps Engineer to join our team in India. The ideal candidate will have a strong background in system reliability and automation, with a passion for improving system performance and ensuring high availability. Responsibilities - Design, implement, and maintain highly available systems and...

  • SRE Devops Engineer

    2 weeks ago


    Thiruvananthapuram / Trivandrum, Chennai, India Kaizen SRA Technologies Private Limited Full time

    Job DescriptionDescriptionWe are seeking an experienced SRE DevOps Engineer to join our team in India. The ideal candidate will have a strong background in system reliability and automation, with a passion for improving system performance and ensuring high availability.Responsibilities- Design, implement, and maintain highly available systems and...

  • Senior AI Engineer

    1 week ago


    Trivandrum, India Lexoga Full time

    We’re hiring for a market-leading edge computing startup that’s building AI infrastructure for remote and low-connectivity environments. Their mission is to power real-time, on-premise intelligence across industries like mining, agriculture, energy, and defense. They’re looking for experienced AI engineers who enjoy tackling real-world ML challenges...