
High Impact Reliability Engineering Manager
2 weeks ago
We are seeking a seasoned and dynamic Reliability Engineering manager to oversee the reliability, scalability, and performance of our critical systems.
This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems. Our team focuses on building robust systems that scale with our business needs.
- Reliability & Performance:
- Lead efforts to maintain high availability and reliability of critical services.
- Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure business requirements are met.
- Proactively identify and resolve performance bottlenecks and system inefficiencies through data-driven approaches.
- Incident Management & Response:
- Establish and improve incident management processes and on-call rotations.
- Lead incident response and root cause analysis for high-priority outages.
- Drive post-incident reviews and ensure actionable insights are implemented to prevent recurrence.
- Automation & Tooling:
- Develop and implement automated solutions to reduce manual operational tasks and enhance efficiency.
- Enhance system observability through metrics, logging, and distributed tracing tools.
- Optimize Continuous Integration/Continuous Deployment (CI/CD) pipelines for seamless deployments.
- Collaboration:
- Partner with software engineering teams to improve the reliability of applications and infrastructure.
- Work closely with product/engineering teams to design scalable and robust systems that meet business needs.
- Ensure seamless integration of monitoring and alerting systems across teams.
- Leadership & Team Building:
- Manage, mentor, and grow a team of Reliability Engineers.
- Promote Reliability Engineering best practices and foster a culture of reliability and performance across the organization.
- Drive performance reviews, skills development, and career progression for team members.
- Capacity Planning & Cost Optimization:
- Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
- Optimize infrastructure and cloud costs while maintaining reliability and performance.
Required Skills & Qualifications:
- Technical Expertise:
- Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
- Hands-on knowledge of infrastructure-as-code tools like Terraform/Helm/Ansible.
- Proficiency in Java.
- Expertise in distributed systems, databases, and load balancing.
- Monitoring & Observability:
- Proficient with tools like Prometheus, Grafana,, Elastic APM, or New relic.
- Understanding of metrics-driven approaches for system monitoring and alerting.
- Automation & CI/CD:
- Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc).
- Skilled in automation frameworks and tools for infrastructure and application deployments.
- Incident Management:
- Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.
- Leadership & Communication Skills:
- Strong people management and leadership skills with the ability to inspire and motivate teams.
- Excellent problem-solving and decision-making skills.
- Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.
Why Join Us?
Be a key driver in building and scaling reliable systems in a fast-paced environment. Work with cutting-edge technologies and influence the evolution of the infrastructure. Lead a high-impact team and foster a culture of reliability and innovation.
-
High-Impact Biotech Solutions Developer
2 weeks ago
Malappuram, Kerala, India beBeeMachine Full time ₹ 15,00,000 - ₹ 20,00,000Key Job Responsibilities">Design, build, and deploy scalable machine learning models that power cutting-edge food safety solutions.Collaborate with data scientists, engineers, and domain experts to translate business requirements into production-ready ML solutions.Optimize model performance, ensuring reliability in live environments.Implement pipelines for...
-
Site Reliability Engineer
6 days ago
Malappuram, Kerala, India beBeeReliability Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job DescriptionAs a seasoned site reliability engineer, you will be responsible for owning availability, latency, and performance of our SaaS on Azure.You will define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. You will report directly to the Director of Site Reliability Engineering.Key...
-
Senior Backend Software Engineer
6 days ago
Malappuram, Kerala, India beBeeBackend Full time ₹ 1,80,00,000 - ₹ 2,40,00,000About Us We are a fast-growing AI startup with a mission to build the world's first AI-powered finance team. Our AI agents close books, reconcile cash, forecast revenue, and prepare board decks—instantly, accurately, and auditable in real-time.Our Vision Backed by leading investors from Foundation Capital, the CEO of Perplexity, authors of the GPT paper,...
-
High-Impact Partnerships Executive
2 weeks ago
Malappuram, Kerala, India beBeePartnership Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Strategic Partnerships Manager A high-impact role awaits a seasoned partnerships professional with expertise in building strategic alliances. Key Responsibilities:Own and expand strategic partnerships with AWS, Azure, GCP, and leading ISVs.Build deep relationships and align on joint GTM plans, solutioning, and co-selling motions.Drive co-created cloud-native...
-
Site Reliability Engineering Lead
1 week ago
Malappuram, Kerala, India beBeeSre Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job Title: Site Reliability Engineering ManagerThe SRE Manager will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance.This role blends technical leadership with team mentorship and cross-functional coordination.Establish and lead the implementation of organizational reliability strategies,...
-
Site Reliability Engineering Lead
7 days ago
Malappuram, Kerala, India beBeeEngineering Full time ₹ 1,20,00,000Job Title:Site Reliability Engineering LeaderJob SummaryWe are seeking a seasoned Site Reliability Engineer to lead our remote team in driving operational excellence and fostering a high-performing culture.Main Responsibilities:To provide leadership and management to a remote team of Site Reliability Engineers, ensuring seamless collaboration and efficient...
-
High Performing System Engineer
2 weeks ago
Malappuram, Kerala, India beBeeReliability Full time ₹ 30,00,000 - ₹ 40,00,000Job Title: SRE Lead (Engineering & Reliability)As a key player in our organization's engineering efforts, we are seeking an experienced and dynamic Site Reliability Engineering (SRE) professional to oversee the reliability, scalability, and performance of our critical systems.This position combines software engineering and systems engineering expertise to...
-
Reliability Engineering Strategist
7 days ago
Malappuram, Kerala, India beBeeReliabilityEngineering Full time ₹ 1,59,45,000 - ₹ 2,69,74,000Job DescriptionWe are seeking a highly skilled Reliability Engineering Strategist to lead our end-to-end product reliability strategy. This role will be responsible for developing and executing strategies that ensure the reliability of our products across all stages of development, from design to manufacturing and field performance.The ideal candidate will...
-
Site Reliability Engineer
1 week ago
Malappuram, Kerala, India CorroHealth Full timeWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...
-
Senior Cloud Reliability Engineer
7 days ago
Malappuram, Kerala, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job Title: Senior Cloud Reliability EngineerAbout the RoleWe are seeking an experienced cloud reliability engineer to help define, drive, and implement a comprehensive reliability strategy. The successful candidate will have a strong background in cloud infrastructure and operations, with experience in promoting high availability and scalability across...