Urgent Search Site Reliability Engineer Lead

3 weeks ago

Hyderabad, Telangana, India Bank of America Full time

About UsAt Bank of America we are guided by a common purpose to help make financial lives better through the power of every connection Responsible Growth is how we run our company and how we deliver for our clients teammates communities and shareholders every day One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world Were devoted to being a diverse and inclusive workplace for everyone We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical emotional and financial well-being Bank of America believes both in the importance of working together and offering flexibility to our employees We use a multi-faceted approach for flexibility depending on the various roles in our organization Working at Bank of America will give you a great career with opportunities to learn grow and make an impact along with the power to make a difference Join us Global Business ServicesGlobal Business Services delivers Technology and Operations capabilities to Lines of Business and Staff Support Functions of Bank of America through a centrally managed globally integrated delivery model and globally resilient operations Global Business Services is recognized for flawless execution sound risk management operational resiliency operational excellence and innovation In India we are present in five locations and operate as BA Continuum India Private Limited BACI a non-banking subsidiary of Bank of America Corporation and the operating company for India operations of Global Business Services Process OverviewInfrastructure Automation Services Operations SRE team is responsible for providing seamless production operation support for Enterprise Automation and Orchestration toolsets and manage the availability stability and maintenance of its underlying infrastructure Team is also responsible to drive SRE best practices to provides a highly integrated ubiquitously available data driven environment to ensure delivery of desired business outcomes focused on reducing risk and exploiting opportunities via automation orchestration and advanced analytics transparency and observability Infrastructure Automation Services Operations SRE team is seeking Site Reliability Engineer This role requires a strong IT professional focused on establishing and improving monitoring to measure end-to-end performance and end-user availability of systems via a suite of common monitoring tools Interface with business partners and operations teams to develop business and technical monitoring requirements A spart of this role the person will primarily be responsible for supporting production or operations of critical applications They will ensure the applications operational readiness by evaluating its performance reliability scale resiliency observability They will be responsible for identifying issues in production triaging identified issues partnering with other engineers on the team to identify the root cause Possess strong analytical ability in solving IT problems working towards automation and elimination of systems and or process bottlenecks Responsibilities As part of the SRE team perform full stack triaging of alerts and engage other engineers to identify root cause of application performance stability issues Work with stakeholders such as product owners to define service level objectives SLOs for application features and services Track performance against SLOs in partnership with development teams or other stakeholders and ensure systems continue to meet SLOs over time Design develop dashboards and reports to communicate key metrics Identify opportunities to improve alerting posture and create update alerts accordingly Work closely with the Engineering team to understand application architecture and perform Single point of failure analysis and create scenarios for testing resiliency of the application Create derive NFR Workload model and ensure performance resiliency is considered early in the SDLC Execute performance chaos tests analyze using APM and other tools to identify performance stability issues Document any findings analysis results communicate and present to stakeholders Perform analytics on previous incidents to understand root causes and use automation to reduce the probability and or impact of problem recurrence Demonstrate proficiency with DevOps tools JIRA ServiceNow MS Project and perform tasks using the tools RequirementsEducation B E B Tech M E M Tech MCA Msc IT Computer Science Certifications If Any NAExperience Range 8 to 10 years of information technology experience with 5 years working on DevOps or SRE team or performance engineering team Foundational Skills 8 years of information technology experience with 5 years working on DevOps or SRE team or performance engineering team Experienced in triaging of production issues using APM tools such as Dynatrace or AppDynamics or New Relic and log aggregation tools such as Splunk ELK etc Strong experience in Java and Front-end development UI and UX React JS Angular Experience with Apache tomcat Middleware and Java RESTful services framework mulesoft is a plus Strong Python UNIX Wintel Perl Shell scripting Strong experience working with CI CD tools - bitbucket JFrog Artifactory Jenkins Artifactory Terraform Packer Ansible Knowledge on Cloud Container and Kubernetes technologies Experience with SRE concepts like SLI SLOs error budgets and working with developers to track and improve them on a continuous basis Must be able to provide oral and written discussion of analytical findings using narrative and graphic forms Must be able to use qualitative and quantitative analytical skills to assess the effectiveness of the operations Identifying symptoms for process improvement Analytical and investigation and organization skills Communications including being able to craft content for executive level presentations Desired Skills Great soft skills - People and communications skills are essential Good proficiency in system network security and database operations protocols and industry standard technologies Experience with tools such as Tanium Artifactory BMC TrueSight Orchestration Experience in command line interfaces CLI third party APIs and integration Experience in server administration with Red Hat Enterprise Linux and Windows Server Good understanding of developing fault tolerant solutions and knowledge in horizontal scaling and resiliency HA Ability to juggle competing priorities and adapt to changes in project scope College Degree or Higher or equivalent work experience Work Timings IST - 9 hrs shift Two Shifts Shift 1- 6 30 AM - 3 30 PM Shift 2 - 1 30 PM - 10 30 PM Weekend Support Yes rotational Job LocationHyderabad Chennai Gandhinagar GIFT Mumbai

Senior Site Reliability Engineer

4 weeks ago

Hyderabad, Telangana, India Options Executive Search Private Limited Full time

Job Title : SRE Lead Engineer. Location : Hyderabad, India. We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core. Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team....
Senior Site Reliability Engineer

5 days ago

Hyderabad, Telangana, India Options Executive Search Private Limited Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Job Title: SRE Lead EngineerLocation: Hyderabad, IndiaWe are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for aSite Reliability Engineering (SRE) Lead Engineerto join their dynamic team. This is an...
Site Reliability Engineer

1 day ago

Hyderabad, Telangana, India Vipany Global Solutions Full time ₹ 10,00,000 - ₹ 25,00,000 per year

Job Description:We are seeking a highly skilled Site Reliability Engineer (SRE) with deep expertise in AWS and Windows Server environments. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our cloud infrastructure and overseeing the execution of various projects in line with business objectives.Key...
Lead - Site Reliability Engineer

4 weeks ago

Hyderabad, Telangana, India VXI Global Solutions Full time

We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...
Site Reliability Engineer

1 week ago

Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 25,00,000 per year

Site Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
Lead - Site Reliability Engineer

3 days ago

Hyderabad, Telangana, India VXI Global Solutions Full time ₹ 20,00,000 - ₹ 25,00,000 per year

We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience withPrometheus,Grafana,Google Cloud Monitoring, andOpenTelemetry, along with exposure toSolarWinds. You should...
Site Reliability Engineering

5 days ago

Hyderabad, Telangana, India Acesoft Labs Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Hi ,Kindly find the below JD :Job Title: Site Reliability Engineering (SRE) ManagerLocation: HyderabadEmployment Type: Full-TimeWork Model - 3 Days from office (Hybrid)Summary:The SRE Manager at TechBlocks India will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This hybrid role blends...
Site Reliability Engineer

1 day ago

Hyderabad, Telangana, India Jigya Software Services Full time ₹ 1,50,000 - ₹ 28,00,000 per year

Job Title:Senior Site Reliability Engineer (SRE) - AWS/KubernetesLocation:Hyderabad - OnsiteJob Type:Full-TimeAbout the Role:We are looking for a highly skilled and motivated Site Reliability Engineer to design, build, and maintain our high-performance, scalable cloud infrastructure. You will play a critical role in ensuring the reliability, performance, and...
Site Reliability Engineer

1 day ago

Hyderabad, Telangana, India SS&C TECHNOLOGIES Full time ₹ 5,00,000 - ₹ 12,00,000 per year

Site Reliability Engineer (PA2025Q3JB087) As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000 employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for...
Site Reliability Engineering

5 days ago

Hyderabad, Telangana, India TECHBLOCKS Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Job Title: Site Reliability Engineering (SRE) ManagerLocation: HyderabadEmployment Type: Full-TimeWork Model - 3 Days from office (Hybrid)Summary:The SRE Manager at TechBlocks India will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This hybrid role blends technical leadership with team...

Americas

Europe

Asia / Oceania

Africa

Urgent Search Site Reliability Engineer Lead