
Site Reliability Engineer II
3 days ago
You will own and manage the entire lifecycle of services from availability, latency, to performance and efficiency. You'll lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.
- Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services. Publish, review quarterly, and align teams to them.
- Error budgeting (policy & tooling): Define an error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds. Gate changes by budget status (freeze/relax rules) wired into CI/CD.
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
- Azure Kubernetes Service (AKS) at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
- CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
- Disaster recovery you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
- (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.
Bachelor's in CS/Engineering or equivalent experience required.
\- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- PostgreSQL expertise including HA/DR, logical/physical replication, performance tuning, autovacuum strategy, partitioning, backup/restore testing, and connection pooling.
- Azure core technologies like AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability tools like Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry.
- IaC/automation skills with Terraform, Bicep, PowerShell, Python, and GitOps.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
As a Senior Site Reliability Engineer, you'll have opportunities for professional growth and development. Our company fosters a culture of collaboration, innovation, and continuous learning.
\- Cross-functional teams with experts in various fields.
- Regular training and upskilling programs.
- Opportunities for career advancement.
We value diversity, equity, and inclusion in our workplace. If you're passionate about technology, reliability, and teamwork, we encourage you to apply.
\- Diverse and inclusive work environment.
- Flexible working arrangements.
- Paid time off and holidays.
-
Site Reliability Engineer
2 weeks ago
Anantapur, Andhra Pradesh, India Employ Full timeRole - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...
-
Site Reliability Engineer
5 days ago
Anantapur, Andhra Pradesh, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 3,00,00,000Senior Leadership RoleWe are seeking an experienced senior leader to fill the role of Site Reliability Engineer at a Global Financial Services Firm.With 12+ years of experience, you will be responsible for defining and implementing SRE strategies, promoting an "Automate-first" culture in operating services through reduction of toil.Develop Process...
-
Site Reliability Engineer
3 days ago
Anantapur, Andhra Pradesh, India beBeeInfrastructure Full time ₹ 15,00,000 - ₹ 22,00,000Reliable Infrastructure SpecialistThe ideal candidate will have a strong background in Site Reliability Engineering, with experience in DevOps and infrastructure management.This includes expertise in CI/CD pipelines, monitoring, automation, and infrastructure as code.Key ResponsibilitiesCollaborate with cross-functional teams to identify and resolve complex...
-
Senior Site Reliability Engineer
2 weeks ago
Anantapur, Andhra Pradesh, India beBeeELK Full time US$ 1,50,000 - US$ 2,50,000Job Title: Senior Site Reliability EngineerWe are looking for a highly skilled Senior Site Reliability Engineer to join our Platform Engineering Practice. The ideal candidate will have extensive expertise in designing, managing and scaling large-scale observability infrastructure using ELK clusters.Key Responsibilities:Design and manage large-scale ELK...
-
Anantapur, Andhra Pradesh, India beBeeReliability Full time ₹ 10,00,000 - ₹ 15,00,000Job TitleAmbitious Site Reliability Engineer Lead to Drive High-Performing SystemsAbout the RoleThis senior-level position demands a results-driven professional to spearhead site reliability engineering practices, lead high-performing teams, and drive automation strategies.In this challenging role, you will collaborate with cross-functional teams to build...
-
Principal Site Reliability Leader
3 days ago
Anantapur, Andhra Pradesh, India beBeeReliability Full time ₹ 25,00,000 - ₹ 32,50,000Job SummaryWe are seeking a seasoned Principal Site Reliability Engineer to lead the operational health of our financial platforms.This role is focused on ensuring the stability, scalability, and operational excellence of financial applications and data services that meet demanding requirements for accuracy, compliance, and availability.Operational...
-
Site Reliability Engineer
3 days ago
Anantapur, Andhra Pradesh, India beBeeReliability Full time ₹ 15,00,000 - ₹ 25,00,000We're looking for a seasoned Site Reliability Engineer to join our dynamic engineering team. As a key player in ensuring system performance and reliability, you will be responsible for enhancing our platform's efficiency, automating manual processes, and collaborating with cross-functional teams.Key Responsibilities:Provide technical leadership and...
-
Systems Reliability Engineer Position
6 days ago
Anantapur, Andhra Pradesh, India beBeeSoftware Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Reliable Systems Engineer RoleWe are seeking a skilled Systems Engineer to join our team. This role involves designing, developing and supporting various tools, services and applications to maintain a reliable site environment.
-
Reliable System Operations Expert
3 days ago
Anantapur, Andhra Pradesh, India beBeeReliability Full time ₹ 1,00,00,000 - ₹ 1,50,00,000Site Reliability EngineerWe are seeking an experienced Site Reliability Engineer to join our team. The ideal candidate will have a strong background in IT, with a focus on system administration and support.Key Responsibilities:Design, develop, and support various tools, services, and applications to maintain a reliable site environment.Monitor, measure, and...
-
Reliable Systems Specialist
3 days ago
Anantapur, Andhra Pradesh, India beBeeSite Full time ₹ 12,76,700 - ₹ 24,93,400Site Reliability EngineerThe role of Site Reliability Engineer (SRE) is pivotal in ensuring the stability, scalability, and operational excellence of Accounting platforms.You will build automation, implement monitoring, improve incident response, and champion DevOps practices to enable Finance systems to operate with consistency and...