Site Reliability Engineering Manager
2 weeks ago
Job Title: Manager of SRE (Site Reliability Engineering) & Application Support
Location: Thane
Reports to: Sr VP Delivery head
Department: Engineering ; Full-Time
About us:
At Netcore, innovation isn’t just a buzzword—it's the core of everything we do. As the pioneering force behind the first and leading AI/ML-powered Customer Engagement and Experience Platform (CEE), we're dedicated to revolutionizing how B2C brands interact with their customers. Our state-of-the-art SaaS products are designed to foster personalized engagement throughout the entire customer journey, creating remarkable digital experiences for businesses of all sizes.
Engineering at Netcore: Dive into a world where your work directly impacts engagement, conversions, revenue, and customer retention. Our engineering team tackles complex challenges that come with scaling high-performance systems. We thrive on versatility and speed, employing advanced tech stacks such as Kafka, Storm, RabbitMQ, Celery, RedisQ, and GoLang, all hosted robustly on AWS and GCP clouds. At Netcore, you're not just solving technical problems—you're setting industry benchmarks.
Job Summary:
We are seeking a seasoned leader for our SRE & Application Support division, overseeing the reliability, scalability, and efficient operation of our martech tools built on open-source frameworks. This role will play a key part in maintaining the operational stability of our products on Netcore Cloud's infrastructure, ensuring 24/7 availability, and driving incident management.
The ideal candidate will combine strong leadership abilities with a deep understanding of site reliability, automation, performance monitoring, and application support, delivering world-class service to our clients and partners.
Key Responsibilities:
SRE Leadership & Strategy:
- Lead the Site Reliability Engineering (SRE) team to design and implement robust systems ensuring uptime, scalability, and security.
- Develop and maintain strategies for high availability, disaster recovery, and capacity planning of all Martech tools.
- Advocate and apply the principles of automation to eliminate repetitive tasks and improve efficiency.
- Establish and refine Service Level Objectives (SLOs), and Service Level Agreements (SLAs) in collaboration with product and engineering teams.
Application Support:
- Oversee and lead the Application Support Team responsible for maintaining the health and performance of customer-facing applications built on the NetcoreCloud platform.
- Develop processes and Debugging procedures to ensure quick resolution of technical issues, and serve as an escalation point for critical incidents.
- Ensure all incidents are triaged and handled efficiently, with proper root cause analysis and follow-up post-mortems for critical incidents.
- Manage the implementation of monitoring tools and log management systems to detect, alert, and respond to potential issues proactively.
Collaboration and Cross-Functional Leadership:
- Work closely with Sales, CSM, Customer Support, development, QA, and DevOps teams.
- Collaborate with stakeholders to drive a culture of continuous improvement by identifying and eliminating potential risks and issues in the system.
- Be involved in PI (Program Increment) planning to align with product roadmaps, making sure reliability is factored into new feature development.
Team Management & Development:
- Recruit, mentor, and manage the SRE and Application Support Team, fostering a high-performance and collaborative environment.
- Conduct regular performance reviews, provide feedback, and support professional development within the team.
Innovation and Open-Source Contribution:
- Lead initiatives to improve the open-source frameworks utilized in the martech stack, contributing to the open-source community as needed.
- Stay current with emerging technologies, tools, and best practices in site reliability, automation, and application support.
Requirements:
Experience:
- 8+ years of experience in SRE, DevOps, or Application Support roles, with at least 3 years in a leadership position.
- Proven track record of managing systems on open-source frameworks and cloud platforms such as NetcoreCloud or similar.
- Demonstrated expertise in incident management, post-mortem analysis, and improving mean time to recovery (MTTR).
- Strong experience in monitoring tools (Prometheus, Grafana, or similar), logging frameworks, and automation tools (Terraform, Ansible).
Technical Skills:
- Hands-on experience with Linux/Unix environments, cloud services (AWS, GCP, NetcoreCloud).
- Proficiency in scripting and coding (Python, Php, Golang, Java, or similar languages) for automation purposes.
- Solid understanding of CI/CD pipelines, version control (Git), and Alert & Application monitoring tools.
Leadership & Soft Skills:
- Proven leadership skills, with experience in team building, mentorship, and fostering a culture of accountability.
- Strong interpersonal and communication skills, with the ability to interface effectively with technical and non-technical stakeholders.
- Ability to manage multiple projects simultaneously, prioritize tasks, and work under pressure to meet deadlines.
Preferred Qualifications:
- Experience in the martech, Digital Marketing domain or working with large-scale, customer-facing SaaS applications.
- Certification in SRE, DevOps, or cloud platforms (AWS, GCP).
- Good application debugging skills, Product feature understanding skills.
Why Join Us?
- Be a part of an innovative and forward-thinking organization that values technology and continuous improvement.
- Work with cutting-edge open-source frameworks and cloud technologies., SAAS Product.
- Leadership opportunities with a direct impact on our customers and product success.
Let's start a conversation and make magic happen together
Website -
-
Site reliability engineer
3 weeks ago
Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Apigee Level 1 Experience: 2 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in...
-
Site reliability engineer
3 weeks ago
Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Apigee Level 1 Experience: 2 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in...
-
Site Reliability Engineer
7 months ago
Mumbai, India dentsu Full timeThe purpose of this role is to ensure the availability and stability of production and test platforms. Job Title: Site Reliability Engineer Job Description: Key responsibilities:Troubleshoots and owns issues in our development, test and production environments. Including performance optimisation and continuous tuningWorks alongside the DevOps team in...
-
Site Reliability Engineer
2 months ago
Mumbai, India FatakPay Digital Pvt. Ltd. Full timeJob Summary :We are looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of our systems. You will focus on monitoring, incident management, and continuous improvement of our :- Monitor system health and uptime using industry-standard tools.- Design and implement incident management processes.- Optimize system...
-
Site Reliability Engineer
1 week ago
Mumbai, India FatakPay Digital Pvt. Ltd. Full timeJob Summary :We are looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of our systems. You will focus on monitoring, incident management, and continuous improvement of our :- Monitor system health and uptime using industry-standard tools.- Design and implement incident management processes.- Optimize system...
-
Site reliability engineer
2 days ago
Mumbai, India Azilen Technologies Full timeObjectives of this Role Act as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed. Deployment of large distributed application in Production/Staging environment. Run the production environment by monitoring availability and taking a holistic view of application...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this RoleAct as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed.Deployment of large distributed application in Production/Staging environment.Run the production environment by monitoring availability and taking a holistic view of application and system...
-
Site reliability engineer
3 weeks ago
Navi Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2 Working days : Work from Office (5 days compulsory) Shift Timings : Rotational Shifts Looking only for #Male candidates and Immediate Joiners. Key Responsibilities: • Monitor system performance and availability across GCP and Anthos environments. • Respond to incidents,...
-
Site Reliability Engineer
4 weeks ago
navi mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2 Working days : Work from Office (5 days compulsory) Shift Timings : Rotational Shifts Looking only for #Male candidates and Immediate Joiners. Key Responsibilities: • Monitor system performance and availability across GCP and Anthos environments. • Respond to incidents, perform root cause...
-
Site Reliability Engineer
4 weeks ago
navi mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2Working days : Work from Office (5 days compulsory)Shift Timings : Rotational ShiftsLooking only for #Male candidates and Immediate Joiners.Key Responsibilities:• Monitor system performance and availability across GCP and Anthos environments.• Respond to incidents, perform root cause...
-
Site reliability engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this RoleAct as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed.Deployment of large distributed application in Production/Staging environment.Run the production environment by monitoring availability and taking a holistic view of application and system...
-
Site reliability engineer
4 weeks ago
Navi Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2Working days : Work from Office (5 days compulsory)Shift Timings : Rotational ShiftsLooking only for #Male candidates and Immediate Joiners.Key Responsibilities:• Monitor system performance and availability across GCP and Anthos environments.• Respond to incidents, perform root cause...
-
Site Reliability Engineer
4 weeks ago
navi mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2Working days : Work from Office (5 days compulsory)Shift Timings : Rotational ShiftsLooking only for #Male candidates and Immediate Joiners.Key Responsibilities:• Monitor system performance and availability across GCP and Anthos environments.• Respond to incidents, perform root cause...
-
Senior Site Reliability Engineer
2 weeks ago
Mumbai, India Ascendion Full timeAbout Ascendion:Ascendion is an ally for clients seeking enterprise digital innovation. We make and manage software platforms and products that power growth and deliver captivating experiences. By embracing the future of work, we bring creativity and execution excellence together to make digital transformation valuable (and even fun). Our engineering, cloud,...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this Role Act as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed. Deployment of large distributed application in Production/Staging environment. Run the production environment by monitoring availability and taking a holistic view of application and...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this RoleAct as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed.Deployment of large distributed application in Production/Staging environment.Run the production environment by monitoring availability and taking a holistic view of application and system...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this RoleAct as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed.Deployment of large distributed application in Production/Staging environment.Run the production environment by monitoring availability and taking a holistic view of application and system...
-
Site Reliability Engineer
3 weeks ago
Mumbai, India Azilen Technologies Full timeObjectives of this Role Act as the primary point of contact for corporate clients, delivering timely, professional support and ensuring seamless on-site service as needed. Deployment of large distributed application in Production/Staging environment. Run the production environment by monitoring availability and taking a holistic view of application...
-
Site Reliability Engineer
4 weeks ago
Navi Mumbai, India SID Global Solutions Full timeJob Description: Site Reliability Engineer (SRE) – Level 1 & 2Working days : Work from Office (5 days compulsory)Shift Timings : Rotational ShiftsLooking only for #Male candidates and Immediate Joiners. Key Responsibilities:• Monitor system performance and availability across GCP and Anthos environments.• Respond to incidents, perform root cause...