
Remote Site Reliability Engineer(DevOps)
4 days ago
Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling):
~ Run the error-budget policy with multi-window, multi-burn-rate alerts; Run weekly SLO reviews with engineering/product.
~ Drive roadmap tradeoffs when budgets are at risk; Engineer reliability in: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
~ Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
~ Capacity & performance: Load testing, right-sizing, autoscaling; Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
~ Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
~ Be the technical owner on calls; If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.
Bachelor’s in CS/Engineering (or equivalent experience).
~PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server .
~ Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
~ SLO design and error-budget operations.
~ PowerShell and Python; Pipelines in Azure DevOps or GitHub Actions.
~ Azure Solutions Architect Expert , CKA/CKAD.
ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
-
Site Reliability Engineer II
7 days ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do - SLIs/SLOs & contracts: Define customer-centric...
-
Site Reliability Engineer II
20 hours ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II)Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll do- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...
-
Site Reliability Engineer II
1 week ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define...
-
Site Reliability Engineer II
7 days ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric...
-
Site Reliability Engineer II
1 week ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.What you’ll doSLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for...
-
Site Reliability Engineer
22 hours ago
Thiruvananthapuram / Trivandrum, India Reflections Info Systems Full timeJob Description As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability...
-
Site reliability engineer ii
3 days ago
Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE. What you’ll do SLIs/SLOs & contracts: Define customer-centric...
-
SRE Devops Engineer
7 days ago
Thiruvananthapuram / Trivandrum, Chennai, India Kaizen SRA Technologies Private Limited Full timeJob Description Description We are seeking an experienced SRE DevOps Engineer to join our team in India. The ideal candidate will have a strong background in system reliability and automation, with a passion for improving system performance and ensuring high availability. Responsibilities - Design, implement, and maintain highly available systems and...
-
SRE Devops Engineer
2 weeks ago
Thiruvananthapuram / Trivandrum, Chennai, India Kaizen SRA Technologies Private Limited Full timeJob DescriptionDescriptionWe are seeking an experienced SRE DevOps Engineer to join our team in India. The ideal candidate will have a strong background in system reliability and automation, with a passion for improving system performance and ensuring high availability.Responsibilities- Design, implement, and maintain highly available systems and...
-
Senior AI Engineer
1 week ago
Trivandrum, India Lexoga Full timeWe’re hiring for a market-leading edge computing startup that’s building AI infrastructure for remote and low-connectivity environments. Their mission is to power real-time, on-premise intelligence across industries like mining, agriculture, energy, and defense. They’re looking for experienced AI engineers who enjoy tackling real-world ML challenges...