
Site Reliability Engineer
7 days ago
Role and Responsibilities :
Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments, and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments, and investment landscape.
Specifically, the Site Reliability Engineer will be responsible for the following :
- Design and maintain monitoring solutions and alerting mechanisms for infrastructure, application performance, and user experience metrics, enabling proactive issue detection and mitigation.
- Implement automation tools and processes to automate routine tasks, scale infrastructure, and ensure seamless deployments, updates, and rollbacks with minimal user impact.
- Ensure the reliability, availability, and performance of applications and services, focusing on minimizing downtime, optimizing response times, and maintaining high availability for users.
- Lead incident response efforts for incidents, including identification, triage, resolution, and post-incident analysis to prevent recurrence and improve system resilience.
- Conduct capacity planning, performance tuning, and resource optimization for environments, collaborating with development and operations teams to meet scalability and performance goals.
- Collaborate with security teams to implement security best practices, perform vulnerability assessments, and ensure compliance with security standards and regulatory requirements for applications.
- Manage deployment pipelines, release processes, and configuration management for app deployments, ensuring consistency, reliability, and version control across environments.
- Identify areas for improvement in reliability, performance, and efficiency through data analysis, root cause analysis, and trend analysis, and drive initiatives to enhance system reliability and operational efficiency.
- Create and maintain documentation, runbooks, and knowledge base articles for operational procedures, troubleshooting guides, and best practices, and promote knowledge sharing within the team.
- Develop and test disaster recovery plans, backup strategies, and failover mechanisms for app services, ensuring business continuity and data integrity in case of failures or disasters.
- Collaborate with development, QA, DevOps, and product teams to ensure alignment on reliability goals, performance metrics, release schedules, and incident response processes.
- Participate in on-call rotations and provide 24/7 support for critical incidents, troubleshoot issues, and coordinate with teams for resolution, escalation, and follow-up actions as per defined SLAs.
Professional Qualifications :
- Proficient in development technologies, architectures, and platforms (web, api) to understand system complexities and performance considerations.
- Experience in cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure as code (IaC) tools for managing app infrastructure and deployments.
- Knowledge of monitoring tools (e.g., Prometheus, Grafana, DataDog, New Relic) and logging frameworks (e.g., Splunk, SumoLogic, ELK Stack) for real-time visibility into system health, performance metrics, and user experience.
- Experience in incident management, including incident response, triage, root cause analysis (RCA), and post-mortem reviews to prevent recurring issues.
- Strong troubleshooting skills to diagnose complex technical issues in app environments, infrastructure, networking, and performance bottlenecks.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Terraform, Ansible) for automating routine tasks, deployments, and infrastructure management.
- Experience in implementing continuous integration/continuous deployment (CI/CD) pipelines for apps using tools like Jenkins, GitLab CI/CD, or Azure DevOps.
- Expertise in setting up monitoring solutions, configuring alerts, and creating dashboards to monitor system performance, application metrics, and user experience.
- Familiarity with APM (Application Performance Monitoring) tools to analyze app performance, identify bottlenecks, and optimize resource utilization.
- Familiarity with RUM (Real User Monitoring) for tracking and analyzing user interaction and system performance.
- Commitment to continuous learning, staying updated with industry trends, new technologies, and best practices in app reliability, performance, and operations.
- Adaptability to evolving requirements, technologies, and business needs, with a focus on driving continuous improvement and operational excellence.
Personal Characteristics :
- Demonstrates judgment and flexibility; thinks about issues and develops solutions that thoughtfully take the broader context into account - positively deals with a shifting demand for time, priorities, and the rapid change of environments.
- Takes an ownership approach to engineering and product outcomes.
- Action-oriented self-starter who can set strategy and drive execution with a roll up the sleeves approach.
- Excellent interpersonal communication, negotiation and influencing skills to work effectively with all stakeholders (internal & external), making information-based decisions.
- Penchant for excellence, both personally and professionally, demonstrated by intellectual curiosity, record of accomplishment, and reputation; shows strong attention to detail and implementation of best practices with an inclination for continuous improvement.
- Ability to quickly establish strong credibility with employees, business partners and external resources.
- Embodies and delivers the firm's values and culture towards colleagues, clients, and communities :
o Win as one team
o Lead with integrity
o Be the change
-
Specialist - Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Accelya Group Full time US$ 90,000 - US$ 1,20,000 per yearFor more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India beBeeCloudReliability Full time ₹ 25,00,000 - ₹ 35,00,000Job Title: DevOps/Site Reliability EngineerWe are seeking a skilled DevOps/Site Reliability Engineer to optimize our infrastructure and improve the overall quality of our software solutions.Engage in process development, implementation, and measurement for Continues Integrations and Delivery, Site Reliability Engineering, and automation of deployment and...
-
Site Reliability Engineer
4 hours ago
Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per yearCompany DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...
-
Site Reliability Engineer
6 days ago
Pune, Maharashtra, India Ather Energy Full time ₹ 15,00,000 - ₹ 28,00,000 per yearYou'll be our: Site Reliability EngineerYou'll be based at: Pune Zonal OfficeYou'll be aligned with: Cloud and Data Platform Lead / Cloud ArchitectYou'll be a member of: Cloud and Data Platform TeamAther's fleet of smart scooters is growing rapidly, and so is the volume of data they generate. Our Vehicle Data Platform (VDP) is the core of this ecosystem, and...
-
Site Reliability Engineer
4 weeks ago
Pune, Maharashtra, India Synechron Full timeWe have immediate opportunity for Site Reliability Engineer 5 to 9 years.Synechron – PuneJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - PuneAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+ people, across 58 offices, in 21 countries, in key...
-
SRE (Site Reliability Engineer)
7 days ago
Pune, Maharashtra, India Apex One Full time ₹ 7,00,000 - ₹ 12,00,000 per yearJob Overview We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any...
-
Site Reliability Engineering Expert
4 weeks ago
Pune, Maharashtra, India Fiserv Full timeSite Reliability Engineering Expert (Architect) Exp. Range:- 9 to 12 Years Location:- Pune Job Description: What does a successful Site Reliability Engineer (SRE) Expert do at Fiserv? The Site reliability engineer blends the principles of software engineering with the discipline of operations to create high-performing and reliable software systems....
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India SailPoint Full time US$ 1,25,000 - US$ 1,75,000 per yearSailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less. IdentityNow is SailPoint's Identity...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Reveille Technologies Full timeJob Summary :We are seeking a skilled and proactive Site Reliability Engineer (SRE) with a strong DevOps mindset and hands-on experience in application troubleshooting. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our applications and infrastructure. This role requires a blend of software engineering,...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Barclays Full time US$ 90,000 - US$ 1,20,000 per yearJoin us as a Site Reliability Engineer - Linux & KDB – AVP at Barclays, We are seeking a highly skilled and motivated KDB Site Reliability Engineer (SRE) to manage and enhance our KDB infrastructure estate. This role is ideal for someone with a strong background in Linux systems, shell scripting, and hands-on experience in financial services. You will be...