Site Reliability Engineering Manager

1 month ago


India Netcore Cloud Full time

Job Title: Manager of SRE (Site Reliability Engineering) & Application Support

Location: Thane

Reports to: Sr VP Delivery head

Department: Engineering ; Full-Time

About us:

At Netcore, innovation isn’t just a buzzword—it's the core of everything we do. As the pioneering force behind the first and leading AI/ML-powered Customer Engagement and Experience Platform (CEE), we're dedicated to revolutionizing how B2C brands interact with their customers. Our state-of-the-art SaaS products are designed to foster personalized engagement throughout the entire customer journey, creating remarkable digital experiences for businesses of all sizes.

Engineering at Netcore: Dive into a world where your work directly impacts engagement, conversions, revenue, and customer retention. Our engineering team tackles complex challenges that come with scaling high-performance systems. We thrive on versatility and speed, employing advanced tech stacks such as Kafka, Storm, RabbitMQ, Celery, RedisQ, and GoLang, all hosted robustly on AWS and GCP clouds. At Netcore, you're not just solving technical problems—you're setting industry benchmarks.

Job Summary:

We are seeking a seasoned leader for our SRE & Application Support division, overseeing the reliability, scalability, and efficient operation of our martech tools built on open-source frameworks. This role will play a key part in maintaining the operational stability of our products on Netcore Cloud's infrastructure, ensuring 24/7 availability, and driving incident management.

The ideal candidate will combine strong leadership abilities with a deep understanding of site reliability, automation, performance monitoring, and application support, delivering world-class service to our clients and partners.

Key Responsibilities:

SRE Leadership & Strategy:

- Lead the Site Reliability Engineering (SRE) team to design and implement robust systems ensuring uptime, scalability, and security.

- Develop and maintain strategies for high availability, disaster recovery, and capacity planning of all Martech tools.

- Advocate and apply the principles of automation to eliminate repetitive tasks and improve efficiency.

- Establish and refine Service Level Objectives (SLOs), and Service Level Agreements (SLAs) in collaboration with product and engineering teams.

Application Support:

- Oversee and lead the Application Support Team responsible for maintaining the health and performance of customer-facing applications built on the NetcoreCloud platform.

- Develop processes and Debugging procedures to ensure quick resolution of technical issues, and serve as an escalation point for critical incidents.

- Ensure all incidents are triaged and handled efficiently, with proper root cause analysis and follow-up post-mortems for critical incidents.

- Manage the implementation of monitoring tools and log management systems to detect, alert, and respond to potential issues proactively.

Collaboration and Cross-Functional Leadership:

- Work closely with Sales, CSM, Customer Support, development, QA, and DevOps teams.

- Collaborate with stakeholders to drive a culture of continuous improvement by identifying and eliminating potential risks and issues in the system.

- Be involved in PI (Program Increment) planning to align with product roadmaps, making sure reliability is factored into new feature development.

Team Management & Development:

- Recruit, mentor, and manage the SRE and Application Support Team, fostering a high-performance and collaborative environment.

- Conduct regular performance reviews, provide feedback, and support professional development within the team.

Innovation and Open-Source Contribution:

- Lead initiatives to improve the open-source frameworks utilized in the martech stack, contributing to the open-source community as needed.

- Stay current with emerging technologies, tools, and best practices in site reliability, automation, and application support.

Requirements:

Experience:

- 8+ years of experience in SRE, DevOps, or Application Support roles, with at least 3 years in a leadership position.

- Proven track record of managing systems on open-source frameworks and cloud platforms such as NetcoreCloud or similar.

- Demonstrated expertise in incident management, post-mortem analysis, and improving mean time to recovery (MTTR).

- Strong experience in monitoring tools (Prometheus, Grafana, or similar), logging frameworks, and automation tools (Terraform, Ansible).

Technical Skills:

- Hands-on experience with Linux/Unix environments, cloud services (AWS, GCP, NetcoreCloud).

- Proficiency in scripting and coding (Python, Php, Golang, Java, or similar languages) for automation purposes.

- Solid understanding of CI/CD pipelines, version control (Git), and Alert & Application monitoring tools.

Leadership & Soft Skills:

- Proven leadership skills, with experience in team building, mentorship, and fostering a culture of accountability.

- Strong interpersonal and communication skills, with the ability to interface effectively with technical and non-technical stakeholders.

- Ability to manage multiple projects simultaneously, prioritize tasks, and work under pressure to meet deadlines.

Preferred Qualifications:

- Experience in the martech, Digital Marketing domain or working with large-scale, customer-facing SaaS applications.

- Certification in SRE, DevOps, or cloud platforms (AWS, GCP).

- Good application debugging skills, Product feature understanding skills.

Why Join Us?

- Be a part of an innovative and forward-thinking organization that values technology and continuous improvement.

- Work with cutting-edge open-source frameworks and cloud technologies., SAAS Product.

- Leadership opportunities with a direct impact on our customers and product success.

Let's start a conversation and make magic happen together

Website -



  • India Delphic (South Asia) Full time

    Job Title: Site Reliability Engineer (SRE)Location: RemoteJob Type: Full-timeExperience: 7 yearsIntroduction:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will...


  • India Delphic (South Asia) Full time

    Job Title: Site Reliability Engineer (SRE) Location: Remote Job Type: Full-time Experience : 7 years Introduction: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You...


  • India Tranzeal Incorporated Full time

    Job Title: Site Reliability Engineer (SRE) Location: Bangalore, KA Work Mode: Office (5Days/Week) Position Type: Contract based We're hiring a Site Reliability Engineer to join our team in Bangalore! If you have a strong background in maintaining and scaling cloud services and love automating infrastructure at scale, this is for you. ...


  • India IDEMIA Full time

    We are hiring for Site Reliability Engineer role at Noida location. Responsibility: Involved in deploy/manage/operate of medium to large scale production systems Understanding of Linux as a runtime environment Familiar to Cloud native concepts and virtualisation Familiar to CI/CD concepts and tools like Jenkins, Gitlab etc Previous...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams.Title: Site Reliability EngineerLocation: India - RemoteEmployment Type: PermanentNotice Period:...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams. Title: Site Reliability Engineer Location: India - Remote Employment Type: Permanent ...


  • India Insight Global Full time

    Title : SRE Duration : 12 month contract Location : HYBRID 3x/week onsite in Hyderabad, India Desired Skills & Experience · Bachelor's degree in Computer Science, Engineering, or a related field. · 3+ years of experience in Systems Engineering or Site Reliability Engineering. · Strong proficiency in Go Lang programming · Experience...


  • India BCE Global Tech Full time

    About the role We are seeking a talented Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in software engineering and systems administration, with a passion for building scalable and reliable systems. As an SRE, you will collaborate with development and operations teams to ensure our services are reliable,...


  • Anywhere in India/Multiple Locations Stealth Startup Full time

    Key ResponsibilitiesAt Stealth Startup, we're looking for a skilled Site Reliability Engineer to maintain and enhance the reliability, availability, and performance of our large-scale distributed systems. Your key responsibilities will include automating deployment, monitoring, and management of production systems, as well as implementing and managing CI/CD...


  • India Gemini Solutions Pvt Ltd Full time

    W e are looking for 3-9 yrs experience candidate in Devops SRE ,In this you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications....


  • India Gemini Solutions Pvt Ltd Full time

    W e are looking for 3-9 yrs experience candidate in Devops SRE ,In this you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications....


  • India Intuitive.Cloud Full time

    About us: Intuitive. Cloud is one of the fastest-growing (INC 5000, CRN) Cloud & SDx solution and services companies supporting enterprise customers on a global scale. Intuitive is an "Engineering Company" delivering measurable value and key business outcomes. Intuitive Superpowers: - Data Ops & AI/ML - Cloud Native, App Sec Ops, Dev Sec Ops - Cloud...


  • India Insight Global Full time

    Title : SRE Duration : 12 month contract Location : HYBRID 3x/week onsite in Hyderabad, India Desired Skills & Experience · Bachelor's degree in Computer Science, Engineering, or a related field. · 3+ years of experience in Systems Engineering or Site Reliability Engineering. · Strong proficiency in GoLang programming · Experience...


  • India Ushur Full time

    Location: Bangalore Experience: 6-8 Years Work Mode: Hybrid/Remote The Role Senior Site Reliability Engineers at Ushur perform a unique blend of customer support engineering, solution engineering, and operational engineering. You will work on our largest customers’ most complex problems and craft intuitive, elegant solutions. You’ll also...


  • india Coforge Full time

    Job Title: Site Reliability EngineerSkills: SRE, CI/CD, AWS, Python, Terraform & KubernetesLocation: Hyderabad (Work from Office)Experience: 7-15 YearsNote: Immediate joiners are preferableJob Description:We at Coforge are hiring a Site Reliability Engineer with the following skillset:Design, implement, and manage scalable and secure cloud-based...


  • india Coforge Full time

    Job Title: Site Reliability Engineer Skills : SRE, CI/CD, AWS, Python, Terraform & Kubernetes Location: Hyderabad (Work from Office) Experience: 7-15 Years Note: Immediate joiners are preferable Job Description: We at Coforge are hiring a Site Reliability Engineer with the following skillset: Design, implement, and manage scalable and secure cloud-based...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams.Title: Site Reliability EngineerLocation: India - RemoteEmployment Type: PermanentNotice Period:...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams. Title: Site Reliability Engineer Location: India - Remote Employment Type: Permanent ...


  • India Mitra AI Full time

    About the job About Mitra Innovation ( Mitra AI is a global technology company that specializes in AI-driven solutions, cloud engineering, enterprise integration and workflow automation. Headquartered in the UK, Mitra has been serving global clients for the last 12 years in the US, UK, EU and APAC, across industries such as BFSI, telecommunications,...


  • India Mitra AI Full time

    About the job About Mitra Innovation ( Mitra AI is a global technology company that specializes in AI-driven solutions, cloud engineering, enterprise integration and workflow automation. Headquartered in the UK, Mitra has been serving global clients for the last 12 years in the US, UK, EU and APAC, across industries such as BFSI, telecommunications,...