Site Reliability Engineering Manager

3 weeks ago

Bangalore Division, India Epsilon Full time

About Business Unit: SaaSOps leads post-production support and the overall experience of Epsilon PeopleCloud products for our global clients. This function is responsible for product support, incident management, managed operations and the automation of processes. The team has successfully incubated and mainstreamed Site Reliability Engineering (SRE) as a practice, to ensure reliable product operations on a global scale. Plus, the team is actively leading the adoption of AI in operations (AIOps) and recently launched AI-driven self-service capabilities to enhance operational efficiency and improve client experiences. About the Role Will be a senior IC role responsible for driving strong operations engineering practices in SaaS product operations. Role will drive the incident triage practices, implement effective monitoring and observability tools and help build SRE competence in the team. Role will be closely working with product operations team to deep dive and identify root cause of production issues and work with concerned teams to come up with a permanent fix to recurring issues Role will identify automation opportunities to streamline repeat tasks. Will contribute to evolution of AIOps strategy - identify use cases and come up with AI / Agentic autonomous solutions What you’ll need 15+ Years hands on experience in SRE The candidate will be hands-on technology leader with a proven experience working as a SRE leader in a SAAS product set up. The candidate should have a deep understanding of monitoring tools (New Relic, Prometheus) and observability practices. Prior experience working with ServiceNow, JIRA, Bitbucket and Confluence required. The candidate should be proficient at designing effective Ops dashboards, especially for peak traffic events in a SaaS environment. The candidate should have prior experience handling communications with leadership across an organization for peak traffic events. The ideal candidate should have a strong full stack engineering background with Cloud Engineering, L1-L3 Operations & AI / Gen AI experience Must have strong development skills - at least two of Python, Java, C#; strong DB skills (RDBMS, NoSql, Cloud DBs), Container / orchestration, Cloud Infrastructure Super proficient in atleast one hyperscaler cloud (AWS, GCP, Azure) Demonstrated real world experience in traditional ML & Gen AI use case deployments in production Candidate should have had experience in working closely with Engineering & Operations team - must have a strong DevOps, Incident Management, Release management, change management experience Prior experience with at least one AIOps solution preferred. Must have proven skills in collaboration and getting things done ITIL certification and experience working in an ITIL environment will be a plus. Epsilon is a global data, technology and services company that powers the marketing and advertising ecosystem. For decades, we’ve provided marketers from the world’s leading brands the data, technology and services they need to engage consumers with 1 View, 1 Vision and 1 Voice. 1 View of their universe of potential buyers. 1 Vision for engaging each individual. And 1 Voice to harmonize engagement across paid, owned and earned channels. Epsilon’s comprehensive portfolio of capabilities across our suite of digital media, messaging and loyalty solutions bridge the divide between marketing and advertising technology. We process 400+ billion consumer actions every single day using advanced AI and hold many patents of proprietary technology, including real-time modeling languages and consumer privacy advancements. Thanks to the work of every employee, Epsilon has been consistently recognized as industry-leading by Forrester, Adweek and the MRC. Epsilon is a global company with more than 9,000 employees around the world. Epsilon has a core set of 5 values that define our culture and guide us to bring value for our clients, our people and consumers. We are seeking candidates that align with our values, demonstrate them and make them meaningful in their day-to-day work: Additional Information Act with integrity . We are transparent and have the courage to do the right thing. Work together to win together . We believe collaboration is the catalyst that unlocks our full potential. Innovate with purpose . We shape the market with big ideas that drive big outcomes. Respect all voices . We embrace differences and foster a culture of connection and belonging. Empower with accountability . We trust each other to own and deliver on common goals. Because You Matter YOUniverse. A work-world with you at the heart of it At Epsilon, we believe people make the place. And everything we do is designed with you in mind. That’s why our work-world, aptly named ‘YOUniverse’ is passionate about crafting a nurturing environment that elevates your growth, wellbeing and work-life harmony. So, come be part of a people-centric workspace where care for you is at the core of all we do. Take a trip to YOUniverse and explore our outstanding benefits, here Epsilon is an Equal Opportunity Employer. Epsilon is committed to promoting diversity, inclusion, and equal employment opportunities by using reasonable efforts to attract, recruit, engage and retain qualified individuals of all ethnicities and backgrounds, including, but not limited to, women, people of color, LGBTQ individuals, people with disabilities and any other underrepresented groups, traits or characteristics.

Site Reliability Engineer

3 weeks ago

Bangalore Division, India JRD Systems Full time

Site Reliability Engineer (Windows / Cloud / Automation) Job Summary: We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments. The ideal candidate will be responsible for designing, implementing, automating, and maintaining scalable infrastructure solutions across AWS, Azure,...
Site Reliability Engineer

2 weeks ago

Bangalore Division, India Andor Tech Full time

Hiring!!🏢 About AndorTech AndorTech is a global IT services and consulting firm founded in 2009, headquartered in Bangalore. The company specializes in software engineering, AI-enabled IT services, application support, analytics, and test automation . With a presence across India, the USA, Europe, and the UAE, AndorTech partners with Global Capability...
Site Reliability Engineer

1 week ago

Bangalore Division, India Karix Full time

Role: Site Reliability Engineer Location: Bangalore (WFO) About the role: We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Service platforms. They are responsible for monitoring, automating, and improving the...
Site Reliability Engineer

3 weeks ago

Bangalore Division, India Groww Full time

About Groww We are a passionate group of people focused on making financial services accessible to every Indian through a multi-product platform. Each day, we help millions of customers take charge of their financial journey. Customer obsession is in our DNA. Every product, every design, every algorithm down to the tiniest detail is executed keeping the...
Site Reliability Engineering Manager

4 weeks ago

bangalore, India CloudHire Full time

Job Summary The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and...
Site Reliability Engineer

3 weeks ago

Pune Division, India Synechron Full time

We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Pune Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Pune About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+ people, across 58 offices, in 21...
Site Reliability Engineer

4 days ago

bangalore, India Pagos Consultants Full time

we are looking for experienced site reliability engineers to join a founding team of startup-minded individuals that will lay the groundwork for our new fintech offering. This team will play a pivotal role in spearheading innovation. As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its...
Site Reliability Engineer

2 weeks ago

bangalore, India super Full time

Site Reliability Engineer (SRE) Level 3Overview:A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and highly reliable systems. This role emphasizes a blend of software and systems engineering to ensure the availability, latency, performance, and capacity...
Site Reliability Engineer

1 week ago

Bangalore Division, India PhonePe Full time

SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production services. Systems internals/security, Linux, Network, and Monitoring work to improve the reliability and performance of the next generation of distributed systems...
Site Reliability Engineer

21 hours ago

bangalore, India Enterprise Minds, Inc Full time

Senior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call) We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP) . If you thrive in fast-paced environments, excel in incident management, and...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineering Manager