
Visionary System Reliability Engineer
5 days ago
Site Reliability Engineer
As a seasoned system reliability engineer, you will be responsible for ensuring the availability, latency, and overall performance of our software as a service (SaaS) platform on Azure. Your primary goal will be to define and enforce reliability standards, lead impactful projects, mentor engineers, and eliminate unnecessary tasks at scale.
Key Responsibilities:
- Service Level Indicators (SLIs)/Service Level Objectives (SLOs): Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams with them.
- Error Budgeting:
- Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
- Gate changes by budget status (freeze/relax rules) wired into CI/CD.
- Manage SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Lead high-impact incidents without drama: Own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
- Optimize AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Ensure observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- Maintain infrastructure as code (IaC) & automation: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
- Guarantee CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Partner with FinOps to reduce spend without hurting SLOs: Load testing, right-sizing, autoscaling.
- Define disaster recovery you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Own customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
Minimum Qualifications:
- Bachelor's degree in Computer Science or Engineering (or equivalent experience).
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- PostgreSQL expertise: Deep operational knowledge incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
- Azure core skills: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability expertise: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
- IaC/automation expertise: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Preferred (nice to have):
- Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
- Azure Solutions Architect Expert, CKA/CKAD.
- ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
- OpenTelemetry, eBPF tooling, or service mesh.
- Multi-tenant SaaS and cost optimization at scale.
-
Reliable Systems Engineer
4 days ago
Varanasi, Uttar Pradesh, India beBeeSiteReliabilityEngineer Full time US$ 80,000 - US$ 1,40,000Job OverviewWe are seeking a highly skilled Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability and efficiency of our systems.
-
Senior Systems Reliability Engineer
2 weeks ago
Varanasi, Uttar Pradesh, India beBeeReliability Full time US$ 2,00,000 - US$ 2,50,000Job Title: System Reliability LeaderWe are seeking a seasoned reliability expert to oversee the system's dependability, scalability, and performance.This position combines software engineering and systems expertise to develop and maintain high-performing systems.Key Responsibilities:Reliability & Performance:Lead efforts to ensure high availability and...
-
Reliable Systems Leader
2 weeks ago
Varanasi, Uttar Pradesh, India beBeeEngineering Full time ₹ 23,00,000 - ₹ 25,50,000**Senior Site Reliability Engineer Role Overview**We are seeking a skilled Senior Site Reliability Engineer to join our team.This is an exciting opportunity for you to leverage your technical expertise and passion for reliability engineering to drive innovation and growth within our organization. As a Senior SRE, you will be responsible for ensuring the...
-
Reliability Engineer
4 days ago
Varanasi, Uttar Pradesh, India beBeeSystem Full time ₹ 30,00,000 - ₹ 50,00,000We're seeking a seasoned reliability engineer to spearhead system optimization and ensure seamless performance.About the RoleAs a critical member of our infrastructure team, you will be responsible for designing and implementing scalable solutions that meet business demands.Key Responsibilities:Develop cloud-based infrastructure using Python and Terraform to...
-
Reliable Systems Engineer and Leader
1 week ago
Varanasi, Uttar Pradesh, India beBeeLeadership Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Site Reliability Engineering LeaderWe are seeking an experienced leader to oversee our Site Reliability Engineering team.As a key member of our leadership team, you will be responsible for defining and driving the SRE strategy, promoting a culture of automation, and developing methodologies for elimination of toil, delay, and redundancy in processes.You will...
-
Reliable Financial System Specialist
6 days ago
Varanasi, Uttar Pradesh, India beBeeSystemResilience Full time ₹ 1,80,00,000 - ₹ 2,16,00,000Job Summary:Accounting and Finance teams seek a skilled Engineer to ensure stability, scalability, and operational excellence of financial platforms. This role focuses on delivering highly reliable financial applications and data services that meet demanding requirements for accuracy, compliance, and availability.Key Responsibilities:Ensure financial...
-
Reliability Engineering Expert
4 days ago
Varanasi, Uttar Pradesh, India beBeereliability Full time ₹ 10,00,000 - ₹ 20,00,000Job OverviewWe are seeking a highly skilled Reliability Engineering Expert to join our team.With 2-4 years of relevant experience, you will be responsible for ensuring the reliability and performance of our systems.You will have extensive knowledge of Linux and Windows systems, as well as proficiency in setting up alerts, dashboards, and analyzing...
-
Reliability Driven Systems Specialist
1 week ago
Varanasi, Uttar Pradesh, India beBeeArtificial Full time US$ 1,50,000 - US$ 2,50,000Incident Resilience Engineer\Organizations rely on seamless operations to drive revenue and trust. Conversely, even brief outages can have devastating effects.BugRaid's innovative solution is a pioneering incident copilot – an intelligent system capable of detecting, diagnosing, and resolving complex production incidents.Achieving autonomous reliability...
-
Site Reliability Engineer
4 days ago
Varanasi, Uttar Pradesh, India beBeeSiteReliabilityEngineer Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job OpportunityWe are seeking a skilled professional to fill a critical role in our organization.This position will play a vital part in ensuring the stability, scalability, and operational excellence of accounting and finance platforms.The ideal candidate will have a strong background in site reliability engineering and be able to lead and participate in...
-
Site Reliability Engineering Lead
2 weeks ago
Varanasi, Uttar Pradesh, India beBeeSiteReliabilityEngineering Full time ₹ 18,00,000 - ₹ 21,00,000A highly influential role is available for a seasoned Site Reliability Engineering Leader to oversee the reliability, scalability and performance of critical systems. The successful candidate will have exceptional leadership skills with a strong ability to inspire and motivate teams.