
Reliable Software Engineer
2 days ago
We are seeking a highly skilled Senior Reliability Engineer to join our team. The ideal candidate will be responsible for ensuring the high availability, performance, and efficiency of our SaaS platform on Azure.
This role requires strong leadership skills, as you will define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale.
You will own the SLIs/SLOs & contracts, error budgeting (policy & tooling), run the error-budget policy with multi-window, multi-burn-rate alerts, clear runbooks, and paging thresholds.
Gate changes by budget status (freeze/relax rules) wired into CI/CD.
Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
- Bachelor's in CS/Engineering or equivalent experience.
- 12+ years in production ops/platform/SRE including 5+ years on Azure.
- PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning indexes/EXPLAIN/ANALYZE pg_stat_statements autovacuum strategy partitioning backup/restore testing connection pooling pgBouncer.
- Azure core: AKS must-have Front Door/App Gateway API Management VNets/NSGs/Private Link Storage Key Vault Redis Service Bus/Event Hubs.
- Observability: Azure Monitor/App Insights Log Analytics Prometheus/Grafana SLO design and error-budget operations.
- IaC/automation: Terraform and/or Bicep PowerShell Python GitOps Flux/Argo Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale blameless postmortems SLO/error-budget governance with change gating.
- Mentorship crisp written/verbal communication.
Requirements: Strong understanding of cloud-based platforms and technologies, including Azure. Experience with containerization using Docker, Kubernetes, and Helm is essential.
Preferred qualifications: Apache NiFi Apache Flink Apache Kafka or Redpanda schema management exactly-once semantics backpressure dead-letter/replay patterns Azure Solutions Architect Expert CKA/CKAD ITSM ServiceNow on-call tooling PagerDuty/Opsgenie Compliance/SecOps SOC 2 ISO 27001 policy-as-code workload identity OpenTelemetry eBPF tooling service mesh Multi-tenant SaaS cost optimization at scale.
-
Software Reliability Engineer
7 days ago
Kannur, Kerala, India beBeeAutomation Full time ₹ 30,00,000 - ₹ 45,00,000Automated Quality Assurance SpecialistThe role of an Automated Quality Assurance Specialist is pivotal in ensuring the reliability and quality of software applications. This position plays a crucial part in identifying and rectifying defects, thereby guaranteeing customer satisfaction.Key Responsibilities:Design, develop, and maintain automated test scripts...
-
Site Reliability Engineer Leader
1 week ago
Kannur, Kerala, India beBeeReliability Full time ₹ 90,00,000 - ₹ 1,20,00,000Job Title: SRE LeadWe are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems.This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.Key Responsibilities:Lead efforts to...
-
Lead Site Reliability Engineer
1 week ago
Kannur, Kerala, India Landmark Group Full timeJob Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...
-
Reliability Engineer
1 week ago
Kannur, Kerala, India beBeeReliability Full time ₹ 45,00,000 - ₹ 50,00,000Job OverviewThis role is ideal for a professional with expertise in reliability engineering and automation.As part of the organization's digital transformation journey, they are heavily investing in infrastructure resilience and reliability to minimize production outages.Key Responsibilities:Analyze and resolve high-impact production issues across...
-
Reliable Engineering Specialist
4 days ago
Kannur, Kerala, India beBeeSreengineer Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Summary">We are seeking a highly skilled SRE Engineer to join our team. As a key member of our engineering organization, you will play a critical role in ensuring the reliability and scalability of our services.">In this role, you will design, build, and maintain high-performance, scalable, and reliable services that meet the needs of our customers. You...
-
Site Reliability Engineer
10 hours ago
Kannur, Kerala, India beBeeSite Full time ₹ 30,00,000 - ₹ 35,00,000Job DescriptionWe are seeking a skilled professional to play a pivotal role in maintaining the stability and scalability of financial systems.The ideal candidate will design automation tools, implement monitoring systems, enhance incident response protocols, and champion DevOps practices to ensure Finance and Accounting systems operate with consistency and...
-
Reliable Systems Expert
2 days ago
Kannur, Kerala, India beBeeEngineering Full time ₹ 1,00,00,000 - ₹ 1,50,00,000Site Reliability EngineerTalent500 offers exciting global job opportunities at Global Capability Centres or GCCs in India.As a Site Reliability Engineer, you play a pivotal role in ensuring our digital backbone runs smoothly for millions of customers.Reliability engineering involves identifying potential system issues early, implementing preventive measures,...
-
Senior Director of Reliability Engineering
4 days ago
Kannur, Kerala, India beBeeStrategic Full time ₹ 32,50,000 - ₹ 45,00,000Job Title: Strategic Site Reliability Engineer LeaderWe are seeking a seasoned executive to lead our Site Reliability Engineering (SRE) team, driving the implementation of an 'Automate-first' culture to reduce operational complexity and enhance service quality.
-
Site Reliability Expert
1 day ago
Kannur, Kerala, India beBeeReliability Full time ₹ 14,34,000 - ₹ 24,40,000Job OpportunityWe are looking for a Senior Software Engineer to join our Site Reliability team. As a key member of this team, you will be responsible for architecting, designing, developing, and supporting Internet-scale features and infrastructures.
-
Senior Site Reliability Engineer
2 days ago
Kannur, Kerala, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job Title: Tech Lead - Reliability EngineerThe successful candidate will ensure the stability, scalability, and operational excellence of financial applications and data services.This role requires strong experience in Site Reliability Engineering, DevOps, or Production Engineering, ideally supporting financial or mission-critical applications.Key...