Lead / Manager - Site Reliability Engineering (SRE)

1 hour ago


Kolkata metropolitan area West Bengal, India CloudHire Full time

Job Summary

The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and strategic alignment with the company's goals. The Technical Manager will act as a bridge between the team and senior leadership, ensuring clear communication, efficient issue resolution, and continuous improvement in service delivery.

Job Category

Technology Solutions

Responsibilities:

● Provide leadership and management to a remote team of Site Reliability Engineers, ensuring alignment with organizational priorities and goals.

● Oversee team operations, including incident management, technical support, and infrastructure maintenance.

● Act as the primary point of escalation for complex technical issues, collaborating with the Director of Systems and Security, Quality Assurance and Product teams as needed.

● Ensure the team adheres to established SLAs for issue resolution and maintains high customer satisfaction levels.

● Mentor and develop team members, fostering growth in technical skills, problem-solving abilities, and customer engagement.

● Lead initiatives to improve operational processes, tools, and workflows, driving greater efficiency and reliability.

● Collaborate with cross-functional teams, including Product, Engineering, and Operations, to address customer needs and improve platform performance.

● Facilitate regular team meetings, performance reviews, and one-on-one sessions to ensure clear communication and ongoing development.

● Maintain and report on key performance metrics, providing insights and recommendations to senior leadership.

● Stay informed on industry trends and best practices, ensuring the team is equipped with the latest tools and methodologies.

● Participate in strategic planning and contribute to the continuous improvement of the SRE function.

Qualifications:

● Proven experience managing technical teams, preferably in Site Reliability Engineering, DevOps, or a related field.

● Strong technical background in cloud computing and infrastructure management, particularly with AWS and Linux-based systems.

● Demonstrated ability to lead and mentor teams in remote and distributed environments.

● Excellent written and oral English communication and interpersonal skills, with the ability to engage effectively with both technical and non-technical stakeholders.

● Strong problem-solving and decision-making abilities, with a focus on root cause analysis and long-term solutions.

● Experience with automation tools (Terraform, Ansible, CloudFormation) and CI/CD pipelines.

● Familiarity with incident management practices and tools, as well as ticketing systems.

● High attention to detail and a commitment to operational excellence.

● Bachelor's degree in a technical or quantitative science field, or equivalent work experience.

Preferred Qualifications:

● AWS certification (any level).

● 3+ years of experience leading customer-facing technical teams, with a focus on improving service delivery.

● Knowledge of security best practices and governance in cloud environments.

● Strong understanding of networking concepts and system architecture.

Key Attributes:

● Empathetic leader who values collaboration, transparency, and accountability.

● Proactive mindset with a focus on continuous improvement and innovation.

● Ability to prioritize and manage multiple initiatives in a fast-paced environment.

● Strategic thinker who can align team efforts with broader organizational objectives.

● Passion for enabling team growth and fostering a culture of learning and development.

Job Location: Kolkata



  • west bengal, India TECEZE Full time

    Role: Site Reliability Engineer (SRE) – Core IT Infrastructure Location: Kolkata Work mode: On-site (full Time) Experience: 6+ year‘s Key Responsibilities Infrastructure Reliability & Operations • Design, implement, and maintain highly available and fault-tolerant infrastructure • Ensure reliability, performance, scalability, and security of core IT...


  • Greater Kolkata Area, India Flipped - Transforming Talent Acquisition with AI Full time

    DescriptionPosition :Senior Site Reliability Engineer (SRE)Experience :10+ Note : Candidates should be ready to work in 24X7 rotational shifts, on call support. Weekly Rotations .About The RoleWe are seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of large-scale production systems.This role demands...


  • Greater Bengaluru Area, India CodeVyasa Full time

    Job DescriptionWe are looking for a skilled SRE Engineer ll Bangalore ll 4-7 yrs. of exp to design, develop, and maintain scalable backend applications. The ideal candidate should have strong experience in Laravel framework, RESTful APIs, and database-driven applications, with a focus on clean code and performance.About UsCodeVyasa is a mid-sized product...

  • Cloud Operations

    4 weeks ago


    west bengal, India elevarae Full time

    We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources...

  • Openshift SRE

    3 weeks ago


    Kolkata, India Tata Consultancy Services Full time

    Job Description Role: OpenShift SRE Experience: 8 t0 15 Years Locations: Chennai, Kolkata, Hyderabad, Bangalore, Pune, Delhi Job Description: - 8+ years of overall experience in roles such as Site Reliability Engineering, DevOps, or Linux Systems Engineering. - 5+ years of hands-on, intensive experience administering, automating, and troubleshooting Red Hat...

  • SRE Lead

    2 days ago


    Pune/Pimpri-Chinchwad Area, India Amdocs Full time

    Job ID:206284Required Travel: No TravelManagerial - NoLocation::India- Pune (Amdocs Site)In one sentenceAs the SRE Lead, you will be responsible for the reliability, operational excellence, and release governance of amAIz (Telco Agentic Suite). You will lead a cross-functional team of NFT, QA, and DevOps Engineers, driving best practices in observability,...

  • Cloud Operations

    4 weeks ago


    Kolkata, India elevarae Full time

    We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources...

  • Cloud Operations

    2 weeks ago


    Kolkata, India elevarae Full time

    We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources...

  • Cloud Operations

    2 weeks ago


    kolkata, India elevarae Full time

    We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources...


  • Greater Bengaluru Area, India Pro5.ai Full time

    Our client is seeking a Site Reliability Engineer I to join their growing technology operations team. This role is ideal for someone passionate about system reliability, incident response, and cross-team collaboration in a large-scale cloud environment. What You’ll Do - Act as the first point of contact for all customer-affecting issues. - Drive and manage...