Service Reliability Engineer
1 day ago
Overview
The Service Reliability Engineer role is within MUFG Retirement Solutions Technology Delivery function accountable for ensuring the reliability and scalability of data services through proactive monitoring, automation, and incident resolution.
This role focuses on maintaining system uptime and reducing unplanned downtime for critical systems.
Key Accountabilities and main responsibilities
Strategic Focus
- System Monitoring: Implement and maintain monitoring frameworks to ensure real-time visibility of system performance.
- Incident Resolution: Lead resolution efforts for critical incidents, ensuring minimal downtime.
- Automation: Develop and implement automation strategies to improve system reliability and reduce manual interventions.
- Collaboration: Work with cross-functional teams to identify and address potential system issues proactively.
- Performance Optimisation: Analyse performance data to drive continuous system improvement. Proactive focus on optimising cloud ROI.
Operational Management
- Perform regular monitoring of the data systems real time to identify any performance issues
- Drive incident meetings to identify the issue and provide resolution thereby ensuring the reliability and scalability of data services
- Develop and implement automation strategies to improve system reliability and reduce manual interventions.
- Propose, document and implement changes to policies or procedures in line with technological advancements
- Assist in the development, maintenance, implementation and changes to the SLAs.
- Monitor and identify any trends or irregular activities on jobs logged that could relate to potential IT issues and escalate appropriately.
- Provide knowledge, training and information support to enable self-service.
- Set procedures and processes in line with standards within the IT Desktop environment.
- Perform quality checks and audit the observations on the work carried out.
- Provide regular updates to leadership on status of all tasks, projects and improvements including issue and risk mitigation solutions, in agreed timeframes.
- Ensure that all requests from stakeholders for assistance are handled promptly and effectively and if necessary escalated to the appropriate level
- Drive the onboarding and rollout of technology services according to the pre-defined roadmap
- Apply best practices like regular system monitoring, performance optimisation, and collaboration for system reliability and uptime.
Governance & Risk
- Adhere to MUFG's standards, policies, and procedures
- Ensure adherence to governance framework set up by the domain and provide accurate matrices accordingly
- Manage risks, dependencies and issues associated with technology delivery.
- Adhere to Regulatory guidance and standards (e.g. CPS230, CPG235 and GDPR)
- Reviewing IT processes and procedures to ensure efficiency and simplicity for the business and meet control objectives as set out by GS007, ISO27001 and other financial industry regulations
The above list of key accountabilities is not an exhaustive list and may change from time-to-time based on business needs.
Experience & Personal Attributes
Experience
- Overall, 7-10 years' experience with minimum 5-7 years in data platform engineering and cloud migration in large, complex organisations.
- Strong experience in automation, cloud computing, and data governance.
- Preference for experience working with onshore teams and key Stakeholders, inclusive of migrations and driving global team collaboration and efficiency.
- Expertise in driving complex technical transformations, decommissions and realising business outcomes through data and analytics.
- Experience in implementing frameworks and policies with the ability to measure the outcomes.
- Govern IT End User Computing that drives transparency, operational stability, financial sustainability and productivity
- Identify and mitigate security risks and ensure IT security design and delivery
- Proficiency in Snowflake, SQL, and DataOps frameworks with understanding of data management and processing.
- Strong analytical skills to analyse performance data to drive continuous system improvement
- Strong troubleshooting skills for resolving critical incidents, ensuring minimal downtime.
- Experience in Data observability automation to scale data monitoring
Personal Attributes
- Effective communication & interpersonal skills to engage with people at all levels of the organization and build strong relationships and trust with global stakeholders.
- A good Problem-Solver and effective decision maker with a focus on overcoming challenges.
- Strong business acumen and passion for current, new and emerging technologies to enable and rollout to the business to improve customer experience
- Strong in developing presentations and the ability to present and capture a wide variety of audiences
- Ability to priorities, organise and plan and to meet demanding deadlines
- Ability to make decisions in a timely manner based on the information, experience and skills available
- Ability to recognise, lead and implement continuous service improvement opportunities.
-
Service Reliability Engineer
2 weeks ago
India Thomson Reuters Full timeWe are seeking an experienced passionate and motivated individual to join our team as a Service Relaibility Engineer As part of this role you will play a pivotal role in driving the reliability availability and performance of our web applications by implementing best practices for system design software development automated testing and continuous...
-
Service Reliability Engineer
2 weeks ago
Building No Sector & A, Gurugram, India BT Group Full time ₹ 9,00,000 - ₹ 12,00,000 per yearService Reliability Engineer Why this job matters The Site Reliability Engineering Associate 3 assists with a range of routine activities in the service performance, reliability and availability that internal and external customers expect. What you'll be doing 1. Assists with routine activities in the implementation of new software development life...
-
Application Support Engineer
1 day ago
India , NA, Mumbai, India, Maharashtra Reliability Engineering Full time ₹ 4,00,000 - ₹ 8,00,000 per yearKey Responsibilities• Monitoring & Incident Management• Monitor RPA bots in production to ensure stability and availability.• Investigate, analyze, and resolve bot failures, errors, and exceptions within defined SLAs.• Provide first and second-level support for RPA processes (depending on role scope).• Problem Resolution & Root Cause Analysis•...
-
Principal Service Reliability Engineer
6 days ago
Pune, India Amadeus Full timeJob Description Job Title Principal Service Reliability Engineer Common Accountabilities - Proficient in technical knowledge to ensure team performs at a high level. Is recognized as a leader in own area and may formally train Specialists/Senior Specialists. - Understands how main business drivers may impact on own area. Can assess complex problems with...
-
Application Support Engineer
2 days ago
NA, Maharashtra, Mumbai, India , India Reliability Engineering Full time ₹ 40,00,000 - ₹ 1,20,00,000 per yearHands on experience of APM (Application Performance Monitoring)> Experience on APM Monitoring tools like Dynatrace, AppDynamics, Grafana, UIM> Experience on Ticketing tool like Service Now, Jira, ITSM> Good Knowledge of SQL and Linux and Basic understanding of Java> Basic understanding of Incident Management Process and Lifecycle> Basic understanding of...
-
Principal Service Reliability Engineer
4 days ago
Hyderabad, India Oracle Full timeJob Description Key Responsibilities - End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance lead sizing/architecture drive service health reviews and process simplification. - Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention...
-
Reliability Engineer
4 days ago
Hyderabad, India Cyient Full timeJob Description We are seeking a highly analytical and detail-oriented Reliability Engineer with specialized experience in Weibull analysis and aircraft reliability data. The ideal candidate will play a critical role in enhancing the safety, performance, and cost-effectiveness of our aircraft fleet by analyzing failure data, predicting component life, and...
-
Site Reliability Engineer
6 days ago
India Akamai Technologies Full timeJob Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...
-
Site Reliability Engineer
1 day ago
India Photon Group Full time ₹ 8,00,000 - ₹ 12,00,000 per yearDescriptionSRE Engineer is responsible for ensuring website uptime, optimizing performance, and maintaining security of the production application. This role involves monitoring site reliability, addressing technical issues, automating maintenance tasks, and collaborating with cross-functional teams to meet business objectives. Responsibilities Run the...
-
Site Reliability Engineer
3 days ago
India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per yearDo you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...