
Systems Engineer of Unwavering Reliability
21 hours ago
Site Reliability Engineer II is a critical role that oversees the availability, latency, performance, and efficiency of our SaaS platform on Azure.
The successful candidate will define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale.
- SLIs/SLOs & contracts: Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
- Error budgeting (policy & tooling):
- Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
- Gate changes by budget status (freeze/relax rules) wired into CI/CD.
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
- AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
- CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
- DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
- (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.
Requirements
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer).
- Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
- IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Preferred Qualifications
- Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
- Azure Solutions Architect Expert, CKA/CKAD.
- ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
- OpenTelemetry, eBPF tooling, or service mesh.
- Multi-tenant SaaS and cost optimization at scale.
-
Reliability Engineering Manager
1 week ago
Allahabad, Uttar Pradesh, India beBeeReliability Full time ₹ 15,00,000 - ₹ 25,00,000Job Title: Reliability Engineering ManagerJob Summary:We are seeking an experienced and dynamic Reliability Engineering Manager to oversee the reliability, scalability, and performance of our critical systems. As a Reliability Engineering Manager, you will play a pivotal role in establishing and implementing reliability practices, leading a team of...
-
System Reliability Specialist
14 hours ago
Allahabad, Uttar Pradesh, India beBeeOperations Full time ₹ 1,80,00,000 - ₹ 2,40,00,000Senior L2 Operations Engineer PositionThe role of a Senior L2 Operations Engineer plays a critical part in maintaining system stability, performance, and reliability through robust observability practices, incident response readiness, and operational excellence.This senior position requires hands-on experience with payment solutions, focusing on platforms...
-
Reliability Solutions Engineer
3 days ago
Allahabad, Uttar Pradesh, India beBeeReliability Full time US$ 1,45,000 - US$ 1,82,500About Our Technology Hub:We strive to deliver innovative solutions that drive excellence in safety, innovation, reliability, and customer experience.Our team contributes directly to these objectives, delivering high-value solutions that work seamlessly with a global team to create memorable experiences for customers.Key Responsibilities:Execute incident...
-
Finance System Reliability Specialist
1 day ago
Allahabad, Uttar Pradesh, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000The role of a Site Reliability Engineer in Finance is pivotal in ensuring the stability, scalability and operational excellence of financial systems.The primary objective of this position is to deliver highly reliable financial applications and data services that meet stringent requirements for accuracy, compliance and availability.As a Site Reliability...
-
Allahabad, Uttar Pradesh, India beBeeReliability Full time ₹ 30,00,000 - ₹ 50,00,000Job DescriptionWe are seeking a highly skilled System Reliability Engineer to join our team. As a key member of our infrastructure group, you will be responsible for ensuring the reliability and scalability of our production environment.You will work closely with our development teams to design, build, and operate production environments for our SaaS...
-
Reliability Engineer Team Lead
1 day ago
Allahabad, Uttar Pradesh, India beBeeManager Full time US$ 72,000 - US$ 1,44,000Job DescriptionThe position of Technical Manager for Site Reliability Engineering will lead a remote team, ensuring operational excellence and fostering a high-performing team culture.This role reports to the Director of Systems and Security and is responsible for overseeing day-to-day operations, technical mentorship, and strategic alignment with...
-
Reliability Engineering Lead
4 days ago
Allahabad, Uttar Pradesh, India beBeeReliability Full time ₹ 1,20,00,000 - ₹ 2,60,00,000Job Summary:The Site Reliability Engineering Manager role involves leading the reliability engineering function to ensure infrastructure resiliency and optimal operational performance.Key Responsibilities:Establish and lead organizational reliability strategies, aligning SLAs, SLOs, and Error Budgets with business goals and customer expectations.Develop...
-
System Reliability Specialist
7 days ago
Allahabad, Uttar Pradesh, India beBeeProductionSupport Full time ₹ 18,00,000 - ₹ 20,00,000About this role:This Production Support Coordinator position oversees the daily operations and support of production systems, ensuring system reliability, availability, and performance. This includes managing the release process, coordinating with solutions architects, and ensuring seamless integration and deployment of new features and updates.The ideal...
-
System Infrastructure Engineer
7 days ago
Allahabad, Uttar Pradesh, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job Title:SRE LeadWe are seeking a seasoned Site Reliability Engineering leader to oversee the reliability, scalability, and performance of our critical systems.This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.Reliability & Performance:Lead efforts to maintain high...
-
Allahabad, Uttar Pradesh, India beBeeProductReliabilityLeader Full time ₹ 12,00,000 - ₹ 25,20,000Product Reliability Leader Job DescriptionWe are seeking a seasoned Product Reliability Leader to develop and execute comprehensive strategies for ensuring the reliability of our electrochemical systems. As a key member of our team, you will lead cross-functional programs to drive measurable product improvements, collaborate with design engineering, R&D, and...