
Reliable Platform Specialist
2 days ago
Job Description:
We are seeking a highly skilled Site Reliability Engineer II to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, latency, performance, and efficiency of our SaaS platform on Azure. You will work closely with cross-functional teams to define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale.
Key Responsibilities:
- Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services and publish quarterly reviews.
- Implement error budgeting policies with multi-window, multi-burn-rate alerts, clear runbooks, and paging thresholds.
- Develop and maintain SLO/EB dashboards using Azure Monitor, Grafana/Prometheus, and App Insights.
- Lead SEV1/SEV2 incidents without drama, own comms, and make corrective actions stick.
- Engineer reliability in AKS clusters, including hardening, optimization, and resilience patterns.
- Design and implement observability solutions using metrics, traces, and logs.
- Develop and maintain IaC templates using Terraform and Bicep.
- Partner with FinOps to reduce costs without hurting SLOs.
- Test backups/restore and validate ASR and multi-region failover.
- Ensure secure by default practices, including Entra ID, managed identities, and Key Vault rotation.
- Reduce toil through automation, self-service runbooks, and chatops.
- Communicate tradeoffs and recovery plans with authority during customer escalations.
- Document architectures, runbooks, postmortems, and SLIs/SLOs to scale.
Requirements:
- Bachelor's degree in CS/Engineering (or equivalent experience).
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- Deep operational expertise in PostgreSQL, including HA/DR, replication, performance tuning, and connection pooling.
- Azure core skills, including AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, and Service Bus/Event Hubs.
- Observability skills, including Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, and SLO design.
- IaC/automation skills, including Terraform, Bicep, PowerShell, Python, and GitOps.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Nice to Have:
- Experience with Apache NiFi, Apache Flink, Apache Kafka, or Redpanda.
- Azure Solutions Architect Expert certification.
- ITSM and on-call tooling experience.
- Compliance/SecOps skills, including SOC 2, ISO 27001, policy-as-code, and workload identity.
- OpenTelemetry and eBPF tooling experience.
- Multi-tenant SaaS and cost optimization experience.
-
Chief Platform Reliability Engineer
2 days ago
Rajahmundry, Andhra Pradesh, India beBeeAutomation Full time ₹ 18,00,000 - ₹ 25,00,000Infrastructure Architect Specialist The RoleWe treat Infrastructure and operations as Software Engineering problems. Our mission is to build and progress software platforms which enables the provisioning and managing of all services in safe, reliable and scalable ways. This role is responsible for designing & architecting new solutions, finding creative...
-
Infrastructure Reliability Specialist
23 hours ago
Rajahmundry, Andhra Pradesh, India beBeeInfrastructure Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job OverviewWe are seeking a highly skilled Infrastructure Reliability Specialist to join our team. This role plays a critical part in ensuring the stability and performance of complex systems.Key Responsibilities:Implement DevOps practices to improve deployment efficiency, monitoring, and automation.Collaborate with cross-functional teams to identify and...
-
Infrastructure Reliability Specialist
1 week ago
Rajahmundry, Andhra Pradesh, India beBeeInfrastructure Full time ₹ 18,00,000 - ₹ 26,40,000Job Title: Infrastructure Reliability SpecialistKey Responsibilities:Identify potential issues using monitoring and observability toolsManage incidents by creating reports, communicating with stakeholders, and providing updatesCollaborate with cross-functional teams to resolve issuesRequirements:3+ years of experience in software development or related...
-
Site Reliability Engineer Specialist
2 weeks ago
Rajahmundry, Andhra Pradesh, India beBeeReliability Full time ₹ 70,00,000 - ₹ 1,05,00,000System Reliability Expert RoleCutting-edge software development drives personalized product creation for millions of global customers. Modular services comprise the Mass Customization Platform.As a leading provider of custom marketing solutions, we foster connections between businesses and their customers worldwide. Our expertise lies in personalized...
-
Site Reliability Expert
2 days ago
Rajahmundry, Andhra Pradesh, India beBeeSiteReliability Full time ₹ 15,00,000 - ₹ 20,00,000Job Title:A highly skilled professional in site reliability engineering is sought after to join our team.">Design and support scalable, reliable, and resilient systems on AWS.Contribute to platform engineering projects to create automated solutions for deployment, scaling, and operations.Analyze system performance and provide recommendations for optimization...
-
Reliability Leader
5 days ago
Rajahmundry, Andhra Pradesh, India beBeeSre Full time ₹ 1,40,32,000 - ₹ 2,51,57,000Reliability Engineering LeaderThe SRE Manager leads the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This role blends technical leadership with team mentorship and cross-functional coordination.Roles and Responsibilities:Establish organizational reliability strategies, aligning SLAs, SLOs, and...
-
Chief Platform Architect
4 days ago
Rajahmundry, Andhra Pradesh, India beBeePlatform Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Overview:A seasoned leader is sought to spearhead the Platform Engineering function and Advanced Resolution Team. This individual will play a pivotal role in shaping internal developer platforms, ensuring reliability, scalability, and efficiency across infrastructure and CI/CD pipelines.This technical leader will drive platform engineering capabilities,...
-
Senior Infrastructure Reliability Specialist
1 week ago
Rajahmundry, Andhra Pradesh, India beBeeSite Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job Title: SRE Lead (Engineering & Reliability)We are seeking an experienced and dynamic Site Reliability Engineering (SRE) professional to oversee the reliability, scalability, and performance of our critical systems. As an SRE leader, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...
-
Site Reliability Specialist
3 days ago
Rajahmundry, Andhra Pradesh, India beBeeQuality Full time ₹ 1,45,87,500 - ₹ 2,42,47,500Reliable System Engineer WantedWe are seeking a skilled and experienced system reliability engineer to join our team. The ideal candidate will have a strong background in software engineering, network protocols, and cloud computing.Key Responsibilities:
-
Senior ITSM Platform Specialist
2 days ago
Rajahmundry, Andhra Pradesh, India beBeeITSM Full time ₹ 1,50,00,000 - ₹ 2,50,00,000About the Role: A seasoned IT specialist is sought to join our team and take on a pivotal position in overseeing the implementation, configuration and customization of our IT Service Management (ITSM) platform. This critical role will ensure seamless integration with various applications and platforms, providing top-notch support and ensuring business...