Director of System Reliability Engineering

1 day ago


Cochin, Kerala, India beBeeSystem Full time ₹ 1,50,00,000 - ₹ 2,50,00,000
Job Title: Director of System Reliability Engineering

We are seeking a seasoned and accomplished Director of System Reliability Engineering to lead the reliability, scalability, and performance of our critical systems.

As an expert in Site Reliability Engineering, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

  • Maintaining high availability and reliability of critical services is essential for our business operations.
  • You will define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure we meet our business requirements.
  • Proactively identifying and resolving performance bottlenecks and system inefficiencies will be key to your success.
  • Establishing and improving incident management processes and on-call rotations will be crucial for minimizing downtime.
  • You will lead incident response and root cause analysis for high-priority outages, driving post-incident reviews and ensuring actionable insights are implemented.
  • Developing and implementing automated solutions to reduce manual operational tasks will help streamline our processes.
  • Enhancing system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM) will provide valuable insights into our system's performance.
  • Optimizing CI/CD pipelines for seamless deployments will ensure that our applications are released quickly and reliably.
Collaboration
  • Partnering with software engineering teams to improve the reliability of applications and infrastructure will be essential for our success.
  • You will work closely with product/engineering teams to design scalable and robust systems.
  • Ensuring seamless integration of monitoring and alerting systems across teams will facilitate effective communication and collaboration.
Leadership & Team Building
  • Managing, mentoring, and growing a team of SREs will require strong leadership skills and abilities to inspire and motivate others.
  • Promoting SRE best practices and fostering a culture of reliability and performance across the organization will be a top priority.
  • Driving performance reviews, skills development, and career progression for team members will ensure their growth and success.
Capacity Planning & Cost Optimization
  • Performing capacity planning and implementing autoscaling solutions to handle traffic spikes will ensure that our systems can scale efficiently.
  • Optimizing infrastructure and cloud costs while maintaining reliability and performance will require careful planning and execution.
Skills & Qualifications

Required Skills:

  • Technical Expertise:
  • Experience with cloud platforms (AWS/Azure/GCP) and Kubernetes.
  • Hands-on knowledge of infrastructure-as-code tools like Terraform/Helm/Ansible.
  • Proficiency in Java.
  • Expertise in distributed systems, databases, and load balancing.
  • Monitoring & Observability:
  • Proficient with tools like Prometheus, Grafana, Elastic APM, or New Relic.
  • Understanding of metrics-driven approaches for system monitoring and alerting.
  • Automation & CI/CD:
  • Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
  • Skilled in automation frameworks and tools for infrastructure and application deployments.
  • Incident Management:
  • Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.

Leadership & Communication Skills:

  • Strong people management and leadership skills with the ability to inspire and motivate teams.
  • Excellent problem-solving and decision-making skills.
  • Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Skills:

  • Experience with database optimization, Kafka, or other messaging systems.
  • Knowledge of autoscaling techniques.
  • Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
  • Understanding of compliance and security best practices in distributed systems.

Becoming a part of our team offers numerous opportunities:

  • You will drive innovation and improvement in building and scaling reliable systems in a fast-paced environment.
  • Work with cutting-edge technologies and influence the evolution of the infrastructure.
  • Lead a high-impact team and foster a culture of reliability and innovation.
],

  • Cochin, Kerala, India beBeeSrsre Full time ₹ 1,80,00,000 - ₹ 2,60,00,000

    Job Overview">We're seeking an experienced System Reliability Engineer to join our team. As a key member of our engineering organization, you will play a critical role in designing and implementing scalable, reliable, and secure systems that meet the needs of our users.As a Senior SRE Engineer, you will be responsible for leading the design and...


  • Cochin, Kerala, India beBeeInfrastructure Full time ₹ 18,00,000 - ₹ 25,00,000

    Job Title: Site Reliability EngineerWe are seeking an experienced Site Reliability Engineer to join our News Team. As a key member of our infrastructure team, you will be responsible for ensuring the stability, performance, and scalability of our systems.The ideal candidate will have a strong background in Linux systems administration, scripting, and DevOps...


  • Cochin, Kerala, India beBeeEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    System Reliability EngineerThe ideal candidate will have experience overseeing the reliability, scalability, and performance of critical systems. We are seeking a dynamic professional to lead our Site Reliability Engineering team.Responsibilities:Reliability & Performance:Ensure high availability and reliability of services through proactive maintenance and...


  • Cochin, Kerala, India beBeeSite Full time ₹ 16,26,767 - ₹ 22,12,338

    Highly Skilled Reliability Expert WantedWe are seeking a highly skilled individual to fill the role of a Senior Site Reliability Engineer. The ideal candidate will have a strong background in ensuring the reliability, scalability, and performance of mission-critical systems.About the Role:The successful candidate will work closely with development and...


  • Cochin, Kerala, India Abilytics, Inc. Full time

    About UsAbilytics, Inc. is a technology solutions provider specializing in cutting-edge IT solutions tailored for modern enterprises offering Platform Engineering & AI services. We specialize in delivering transformative solutions through Custom AI Agent Development, Data Engineering, MLOps, AIOps, DevOps and AI-Powered Analytics. Our cloud-native expertise...


  • Cochin, Kerala, India Abilytics, Inc. Full time

    About Us Abilytics, Inc.is a technology solutions provider specializing in cutting-edge IT solutions tailored for modern enterprises offering Platform Engineering & AI services.We specialize in delivering transformative solutions through Custom AI Agent Development, Data Engineering, MLOps, AIOps, Dev Ops and AI-Powered Analytics.Our cloud-native expertise...


  • Cochin, Kerala, India Abilytics, Inc. Full time

    About Us Abilytics, Inc. is a technology solutions provider specializing in cutting-edge IT solutions tailored for modern enterprises offering Platform Engineering & AI services. We specialize in delivering transformative solutions through Custom AI Agent Development, Data Engineering, MLOps, AIOps, DevOps and AI-Powered Analytics. Our cloud-native...


  • Cochin, Kerala, India Abilytics, Inc. Full time

    About UsAbilytics, Inc. is a technology solutions provider specializing in cutting-edge IT solutions tailored for modern enterprises offering Platform Engineering & AI services. We specialize in delivering transformative solutions through Custom AI Agent Development, Data Engineering, MLOps, AIOps, DevOps and AI-Powered Analytics. Our cloud-native expertise...


  • Cochin, Kerala, India beBeeMechanical Full time ₹ 18,00,000 - ₹ 24,00,000

    Job Title: Senior Mechanical Engineering DirectorDescription:• Serve as a key leader and mentor for a team of experienced mechanical engineers in water, effluent, and sewage treatment projects.• Develop strategic design concepts to ensure efficient project execution and meet client objectives.• Plan and coordinate engineering deliverables, execute...


  • Cochin, Kerala, India beBeeBackend Full time ₹ 20,30,100 - ₹ 23,41,700

    About Stablecoin PaymentsWe are building a new initiative focused on APAC, where we have a strong foothold.Early experiments and customer conversations have shown strong market pull, and we are now setting up the right team to scale.Role OverviewYou will own engineering for the entire stablecoin initiative — spanning both the application and infrastructure...