Senior Application Reliability Engineering Manager

3 weeks ago


Pune, Maharashtra, India TripleLift Full time

About TripleLift :

We're TripleLift, an advertising platform on a mission to elevate digital advertising through beautiful creative, quality publishers, actionable data and smart targeting.

Through over 1 trillion monthly ad transactions, we help publishers and platforms monetize their businesses.

Our technology is where the world's leading brands find audiences across online video, connected television, display and native ads.

Brand and enterprise customers choose us because of our innovative solutions, premium formats, and supportive experts dedicated to maximizing their performance.

As part of the Vista Equity Partners portfolio, we are NMSDC certified, qualify for diverse spending goals and are committed to economic inclusion.

Find out how TripleLift raises up the programmatic ecosystem at Summary :

The Sr.Engineering Manager Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally.

This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives.

Partners closely with Product Teams, Applications teams, Infrastructure, and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability, availability and performance.

The ideal candidate will bring strong communication skills, collaborating with key stakeholders across the

company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic, fast-paced environment.

Job Responsibilities :

- Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management.

- Oversee the management of high-severity incidents, driving quick resolution and post-incident analysis to identify root causes and prevent recurrence.

- Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage.

- Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders.

- Drive Applications Performance Management and Monitoring.

- Assess application architectures to identify key monitoring points.

- Identify Key Performance Indicators, apply monitoring, and report out on compliance.

- Gather information to develop reporting metrics and KPIs.

- Ensure that all applications adhere to appropriate monitoring standards based on their technology/business process.

- Determine forums and cadence to provide regular monitoring updates.

Skill and Experience :

- 8-10 years' experience in information technology process and/or technical project management including :

- 4+ years of experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems, with 2+ years of experience on public cloud, preferably AWS.

- 4+ years of project lead or management experience, preferably in SRE areas.

- Proven automation and programming experience in one or more of the following languages: Java, Python, Go, Perl, Bash.

- Deep understanding of SRE practices such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning.

- Strong working knowledge of modern, continuous development techniques and pipelines (Agile, Kanban, Jira, CI/CD, Jenkins, Git, Artifactory).

- Production level expertise with containerization orchestration engines such as Kubernetes.

- Experience with deploying, monitoring, and troubleshooting large-scale, distributed applications in cloud environments such as AWS.

- Familiarity with security frameworks such as ISO27001, SOCII, PCI-DSS, and/or HIPAA.

- Experience working with SaaS, IaaS, and PaaS offerings.

- Ability to work with global teams located in US and India.

- 6+ years experience in a technical discipline role with experience in planning, implementing and evaluating processes, systems and/or initiatives.

- Broad technical acumen across multiple disciplines applications with a solid understanding of current technologies.

- Experience applying measurement processes/methods for assessing program outputs and outcomes or progress toward goals and objectives.

- Extremely high level of analytical ability with complex problems.

- Ability to work across organizational boundaries, to help lead and influence change.

- Ability to command the process across all levels to ensure customer focus; including being assertive and self-starting.

- Demonstrated leadership experience in influence and garnering alignment from external organizations.

- Ability to align change management strategies with projects.

- Skilled in conceptualizing creative solutions, documenting them, and presenting/selling them to senior management.

- Very high level of interpersonal skills to work effectively with others, motivate employees, and elicit work output in a team environment.

Education and Certifications :

- Undergraduate degree in Computer Science or related technical field or equivalent experience/certification.

(ref:hirist.tech)

  • Pune, Maharashtra, India beBee Careers Full time

    **Job Description**We are seeking an experienced Application Reliability Engineer to join our team. The successful candidate will have a strong background in software engineering and a passion for ensuring that applications are reliable, scalable, and performant.**Key Responsibilities**Analyze and troubleshoot issues related to application reliability and...


  • Pune, Maharashtra, India beBee Careers Full time

    Job OverviewThis is a leadership role that oversees the reliability, scalability, and performance of mission-critical cloud services. The ideal candidate will have strong communication skills and be able to collaborate with cross-functional teams to ensure alignment between SRE strategy and business objectives.The successful candidate will drive the...


  • Pune, Maharashtra, India beBee Careers Full time

    About the RoleWe are seeking a highly motivated and experienced Senior Analyst to join our team. As a Java Application Reliability Engineer, you will play a critical role in ensuring the availability and performance of our applications.ResponsibilitiesThe key responsibilities of this role include:Designing and implementing proactive monitoring strategies to...


  • Pune, Maharashtra, India beBee Careers Full time

    Job DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our team. This is an exciting opportunity for a seasoned professional who can drive performance and reliability in our systems.About the RoleThe Senior Site Reliability Engineer will be responsible for ensuring the high availability and scalability of our applications and...


  • Pune, Maharashtra, India beBee Careers Full time

    About This RoleThe Senior Application Reliability Engineering Manager is responsible for ensuring the reliability, scalability, and performance of mission-critical cloud services that support millions of customers globally.This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure...


  • Pune, Maharashtra, India beBee Careers Full time

    Job DescriptionWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability and performance of our infrastructure and applications.Base Skills:Performance Testing & Engineering, Scalability, AvailabilityExperience with Load Testing tools: Jmeter/LoadRunnerExperience with...


  • Pune, Maharashtra, India Synechron Full time

    About SynechronWe are a leading global digital consulting firm, providing innovative technology solutions for business. We're always at the forefront of change as we lead digital optimization and modernization journeys for our clients.Job Description:We have an immediate opportunity for a Senior Site Reliability Engineer with 7+ years of experience. The...


  • Pune, Maharashtra, India beBee Careers Full time

    About the RoleThis Senior Application Reliability Engineering Manager position is responsible for ensuring the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of customers globally.


  • Pune, Maharashtra, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 7+ years.Synechron – PuneJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - PuneAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+ people, across 58 offices, in 21...


  • Pune, Maharashtra, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5+ years.Synechron – Bangalore / PuneJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - Bangalore / PuneAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+ people, across...