 
						Site Reliability Engineer II
3 weeks ago
Senior Site Reliability Engineer (SRE II)
Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.
What you’ll do
- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
- Error budgeting (policy & tooling):
- Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
- Gate changes by budget status (freeze/relax rules) wired into CI/CD.
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
- AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
- CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
- DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
- (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.
Minimum qualifications
- Bachelor’s in CS/Engineering (or equivalent experience).
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
- Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
- IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Preferred (nice to have)
- Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
- Azure Solutions Architect Expert, CKA/CKAD.
- ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
- OpenTelemetry, eBPF tooling, or service mesh.
- Multi-tenant SaaS and cost optimization at scale.
- 
					  System Support Engineer2 weeks ago 
 Trivandrum, Kerala, India, Thiruvananthapuram Soffit Infrastructure Services Ltd Full timeJob Overview:Soffit is seeking a dedicated and qualified System Support Engineer to maintain and enhance its server, storage, and cloud infrastructure. The selected candidate will ensure high system availability, reliable service delivery, and optimized performance. The role requires hands-on experience with both on-premises and cloud-based technologies and... 
- 
					  Senior Electrical Design Engineer3 weeks ago 
 Trivandrum, Kerala, India, Thiruvananthapuram ASECOM Ventures Pvt Ltd Full timeCompany DescriptionASECOM is a leading multi-disciplinary organization dedicated to offer consultancy service in MEP, Construction Management & Sustainable Buildings.Role DescriptionThis is a full-time, on-site role located in Thiruvananthapuram. The Senior Electrical Design Engineer will be responsible for lead the design, development, and coordination of... 
- 
					  Senior Electronics Engineer2 weeks ago 
 Trivandrum, Kerala, India, Thiruvananthapuram Terumo Blood and Cell Technologies Full timeJOB SUMMARYWe are looking for a highly skilled and experienced Senior Embedded Systems Engineer to join our dynamic team. In this role, he/ she will: Be responsible for Designing, developing, and maintaining embedded systems and software for medical devices. Work closely with cross-functional teams to ensure the successful integration of hardware and... 
- 
					  FS Dev(Python Databricks)3 weeks ago 
 Trivandrum, Kerala, India, Thiruvananthapuram CareStack™ - Dental Practice Management Full timeAbout CareStackCareStack’s mission is to simplify dental practice management and allow dental practitioners to truly focus on things that matter - patient care and business growth. Founded in 2015, we’re an all‑in‑one cloud platform that streamlines daily operations, strengthens patient relationships, and boosts staff productivity. Built for clinics... 
- 
					  Product Technology Lead3 weeks ago 
 Trivandrum, Kerala, India, Thiruvananthapuram Terumo Blood and Cell Technologies Full timeJOB SUMMARYThis role leads technical excellence and drives innovation across the complete lifecycle of assigned product lines - from concept development through sustaining engineering to end-of-life. Through strong technical expertise and influential leadership, this role develops new approaches to product and process improvements, drives critical technical... 
- 
					Site Reliability Engineer II1 week ago 
 India Akamai Full time ₹ 6,00,000 - ₹ 18,00,000 per yearAre you passionate about Linux and automation at scale?Would you like to own critical services in a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a public cloud from the ground... 
- 
					Senior II Site Reliability Engineer1 week ago 
 India Akamai Full time ₹ 8,00,000 - ₹ 25,00,000 per yearDo you have the passion to architect and lead the next generation of public cloud infrastructure?Would you like to lead modernization initiatives while building a public cloud platform from scratch?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform.... 
- 
					  Remote site reliability engineer(devops)2 weeks ago 
 Trivandrum, India Zafin Full timeSenior Site Reliability Engineer (SRE II) Own availability, latency, performance, and efficiency for Zafin’s Saa S on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Error budgeting (policy & tooling): ~ Run the error-budget policy with multi-window, multi-burn-rate alerts;... 
- 
					  Site Reliability Engineer II3 weeks ago 
 Chennai, India Trimble Inc. Full timeJob Description Your Title: Site Reliability Engineer -II Job Location: Chennai, India Our Department: Trimble Platform Are you interested in cutting edge cloud technologies, ready to dirt your hands in the cloud world Do you like to be part of a core team with industry leading site reliability engineering standards About The Role Are you passionate about... 
- 
					
					
 Noida, India BOLD Full timeJob Description BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI/CD pipelines and automate the build and deployment process. He/she will also work collaboratively with different teams including Dev, QA, and infrastructure. Job Description... 
- 
					  Site Reliability Engineer4 days ago 
 Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And... 
- 
					  Senior Site Reliability Engineer3 weeks ago 
 Thiruvananthapuram / Trivandrum, India Equifax Full timeJob Description Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SRE is... 
- 
					  Site Reliability Engineer2 weeks ago 
 , India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact... 
- 
					Site Reliability Engineer1 week ago 
 India Akamai Full time ₹ 5,00,000 - ₹ 15,00,000 per yearDo you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a... 
- 
					  Site Reliability Engineer II4 days ago 
 India Juniper Square Full time ₹ 15,00,000 - ₹ 25,00,000 per yearAbout Juniper Square Our mission is to unlock the full potential of private markets. Privately owned assets like commercial real estate, private equity, and venture capital make up half of our financial ecosystem yet remain inaccessible to most people. We are digitizing these markets, and as a result, bringing efficiency, transparency, and access to one of...