
Site Reliability Engineer
1 day ago
EXP required - 5 to 8 years.
Role and Responsibilities
Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape. Specifically, the Site Reliability Engineer will be responsible for the following:
· Design and maintain monitoring solutions and alerting mechanisms for infrastructure, application performance, and user experience metrics, enabling proactive issue detection and mitigation.
· Implement automation tools and processes to automate routine tasks, scale infrastructure, and ensure seamless deployments, updates, and rollbacks with minimal user impact.
· Ensure the reliability, availability, and performance of applications and services, focusing on minimizing downtime, optimizing response times, and maintaining high availability for users.
· Lead incident response efforts for incidents, including identification, triage, resolution, and post-incident analysis to prevent recurrence and improve system resilience.
· Conduct capacity planning, performance tuning, and resource optimization for environments, collaborating with development and operations teams to meet scalability and performance goals.
· Collaborate with security teams to implement security best practices, perform vulnerability assessments, and ensure compliance with security standards and regulatory requirements for applications.
· Manage deployment pipelines, release processes, and configuration management for app deployments, ensuring consistency, reliability, and version control across environments.
· Identify areas for improvement in reliability, performance, and efficiency through data analysis, root cause analysis, and trend analysis, and drive initiatives to enhance system reliability and operational efficiency.
· Create and maintain documentation, runbooks, and knowledge base articles for operational procedures, troubleshooting guides, and best practices, and promote knowledge sharing within the team.
· Develop and test disaster recovery plans, backup strategies, and failover mechanisms for app services, ensuring business continuity and data integrity in case of failures or disasters.
· Collaborate with development, QA, DevOps, and product teams to ensure alignment on reliability goals, performance metrics, release schedules, and incident response processes.
· Participate in on-call rotations and provide 24/7 support for critical incidents, troubleshoot issues, and coordinate with teams for resolution, escalation, and follow-up actions as per defined SLAs.
Professional Qualifications
· Proficient in development technologies, architectures, and platforms (web, api) to understand system complexities and performance considerations.
· Experience in cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure as code (IaC) tools for managing app infrastructure and deployments.
· Knowledge of monitoring tools (e.g., Prometheus, Grafana, DataDog, New Relic) and logging frameworks (e.g., Splunk, SumoLogic, ELK Stack) for real-time visibility into system health, performance metrics, and user experience.
· Experience in incident management, including incident response, triage, root cause analysis (RCA), and post-mortem reviews to prevent recurring issues.
· Strong troubleshooting skills to diagnose complex technical issues in app environments, infrastructure, networking, and performance bottlenecks.
· Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Terraform, Ansible) for automating routine tasks, deployments, and infrastructure management.
· Experience in implementing continuous integration/continuous deployment (CI/CD) pipelines for apps using tools like Jenkins, GitLab CI/CD, or Azure DevOps.
· Expertise in setting up monitoring solutions, configuring alerts, and creating dashboards to monitor system performance, application metrics, and user experience.
· Familiarity with APM (Application Performance Monitoring) tools to analyze app performance, identify bottlenecks, and optimize resource utilization.
· Familiarity with RUM (Real User Monitoring) for tracking and analyzing user interaction and system performance.
· Commitment to continuous learning, staying updated with industry trends, new technologies, and best practices in app reliability, performance, and operations.
· Adaptability to evolving requirements, technologies, and business needs, with a focus on driving continuous improvement and operational excellence.
Personal Characteristics
· Demonstrates judgment and flexibility; thinks about issues and develops solutions that thoughtfully take the broader context into account - positively deals with a shifting demand for time, priorities, and the rapid change of environments.
· Takes an ownership approach to engineering and product outcomes.
· Action-oriented self-starter who can set strategy and drive execution with a "roll up the sleeves" approach.
· Excellent interpersonal communication, negotiation and influencing skills to work effectively with all stakeholders (internal & external), making information-based decisions.
· Penchant for excellence, both personally and professionally, demonstrated by intellectual curiosity, record of accomplishment, and reputation; shows strong attention to detail and implementation of best practices with an inclination for continuous improvement.
· Ability to quickly establish strong credibility with employees, business partners and external resources.
· Embodies and delivers the firm's values and culture towards colleagues, clients, and communities:
o Win as one team
o Lead with integrity
o Be the change
BenefitsTalent Worx Is a emerging recruitment firm. we are hiring for our client who is in advance the way the world pays, banks, and invests. With decades of expertise, we provide financial technology solutions to financial institutions, businesses, and developer
-
Cloud Site Reliability Engineer
23 hours ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
Site Reliability Engineer
5 days ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per yearHiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India NatWest Group Full timeSite Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per yearRole Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India Ford Motor Full timeSRE - Software Engineer Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJoin us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...
-
Site Reliability Engineer III
1 day ago
Chennai, Tamil Nadu, India ACV Full time ₹ 1,04,000 - ₹ 1,30,878 per yearACV's mission is to build and enable the most trusted and efficient digital marketplaces for buying and selling used vehicles with transparency and comprehensive data that was previously unimaginable. We are powered by a combination of the world's best people and the industry's best technology. At ACV, we are driven by an entrepreneurial spirit and...
-
Senior Site Reliability Engineer
7 days ago
Chennai, Tamil Nadu, India Keuro Life Full time ₹ 10,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer / DevOps We are seeking an experienced Site Reliability Engineer / DevOps professional with a minimum of 6 years in the industry. The ideal candidate will be adept at managing large-scale, high-traffic production environments and ensuring their reliability. Key Responsibilities : - Manage and optimize production environments...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India Trimble Full timeSite Reliability Engineer II Your Title: Site Reliability Engineer -II Job Location: Chennai, India Our Department: Trimble Platform Are you interested in cutting edge cloud technologies, ready to dirt your hands in the cloud world? Do you like to be part of a core team with industry leading site reliability engineering standards? About the...
-
Site Reliability Engineer
1 day ago
Chennai, Tamil Nadu, India Parkar Digital Full time ₹ 20,00,000 - ₹ 25,00,000 per yearAbout Parkar:We love building software products. With a decade of experience and a global presence across four countries, we've established ourselves as a trusted partner for over 100 organizations, helping them leverage technology to drive transformative growth. Staying at the forefront of technological advancements, we actively explore and integrate the...