Site Reliability Engineer

4 weeks ago

Bangalore Karnataka, India Visa Full time

Company Description Visa is a world leader in payments and technology with over 259 billion payments transactions flowing safely between consumers merchants financial institutions and government entities in more than 200 countries and territories each year Our mission is to connect the world through the most innovative convenient reliable and secure payments network enabling individuals businesses and economies to thrive while driven by a common purpose - to uplift everyone everywhere by being the best way to pay and be paid Make an impact with a purpose-driven industry leader Join us today and experience Life at Visa We are seeking an accomplished Site Reliability Engineer SRE - Sr Consultant to join our dynamic Observability team In this senior role you will provide technical leadership in developing and maintaining reliable secure and cost-effective observability solutions that support our global operations As the Sr consultant SRE you will serve as the strategic bridge between development and operations ensuring all systems and services are efficient highly available resilient and scalable You will collaborate closely with software engineers system administrators and cross-functional stakeholders to drive automation optimize performance and enable seamless application delivery You will take end-to-end ownership of critical observability initiatives with a strong focus on availability performance security and reliability You will lead the design and implementation of robust monitoring alerting and automation frameworks to minimize incidents and accelerate incident resolution Your leadership will be instrumental in guiding and mentoring the team ensuring best practices are consistently adopted and operational excellence is maintained Key responsibilities include driving continuous improvement across processes tools and technologies leading root cause analysis and developing preventive measures for production incidents You will champion a culture of collaboration innovation and proactive problem-solving supporting engineering teams with the technical expertise needed to meet demanding requirements As an integral member and leader within our Agile Scrum teams your technical acumen leadership skills and ability to mentor others will be central to delivering impactful high-quality results Responsibilities Lead SRE and DevOps operations during APAC hours ensuring alignment with project objectives delivery timelines SLAs and OLAs Act as the primary escalation point for complex technical issues and incidents driving resolution and communicating status to leadership and stakeholders Provide strategic input and recommendations on SRE and DevOps initiatives to management supporting roadmap planning and resource allocation Coordinate and manage relationships with multiple stakeholders both internal and external across various technology domains Analyze production defects perform in-depth root cause analysis across code data and infrastructure and champion the implementation of long-term preventative solutions Mentor guide and inspire team members through technical leadership code reviews pairing and ongoing knowledge sharing Lead security and compliance efforts by ensuring timely application of security patches hotfixes and adherence to cybersecurity best practices Oversee the design deployment and continuous improvement of monitoring alerting and logging instrumentation ensuring comprehensive observability Architect and drive the development of automation frameworks to optimize operational efficiency eliminate manual toil and streamline system integration Manage and support observability platforms including Splunk ClickHouse Grafana Prometheus M3DB OpenTelemetry Fluent Bit ElasticSearch OpenSearch and CloudWatch Collaborate with development and product teams to design and implement scalable monitoring solutions and support the creation of reliable environments across the SDLC Promote and enforce DevOps and SRE best practices fostering a culture of automation reliability and continuous improvement across the organization Design implement and maintain robust CI CD pipelines enabling rapid reliable and automated software delivery Administer optimize and scale cloud infrastructure AWS GCP to ensure high availability performance and security Lead the adoption and management of infrastructure as code practices using tools such as Terraform Ansible or CloudFormation Continuously monitor and analyze system health proactively identifying and mitigating risks to reliability and performance Oversee deployment and management of containerization and orchestration solutions Docker Kubernetes for modern application delivery Drive incident management processes including leading post-incident reviews facilitating blameless postmortems and implementing actionable improvements Create maintain and improve detailed documentation for infrastructure processes runbooks and standard operating procedures Provide advanced technical support and troubleshooting guiding team members through complex infrastructure and deployment issues Identify propose and implement opportunities for process tooling and workflow automation to drive operational excellence Lead disaster recovery planning capacity management and business continuity initiatives in collaboration with cross-functional teams Evaluate recommend and drive the adoption of new technologies tools and practices that enhance reliability scalability and observability Present technical strategies incident findings and project updates to executive leadership and cross-functional stakeholders Foster an inclusive and collaborative team environment supporting professional growth and the continuous development of SRE best practices Visa s Observability ecosystem includes over 2 000 platform nodes utilizing approximately 15 different tools for logging monitoring and tracing alongside 80 000 client agents The system handles daily log ingestion exceeding 100TB and oversees hundreds of critical applications supporting vital alerts dashboards and reports To maintain this high level of performance and reliability we need a Site Reliability Engineer - Sr Consultant with comprehensive knowledge and practical experience This position requires an I6 5-level engineer who can operate independently with minimal supervision About Visa s PRE Observability Team Visa s Product Reliability Engineering PRE Observability team partners with Product Development as well as Operations Infrastructure teams to build and manage innovative reliable scalable secure and cost-effective observability platform solutions We are looking for talented Senior Site Reliability Engineers to join our driven team with a focus on maximizing system availability performance security and reliability This dynamic role requires technical leadership strong problem-solving skills and expertise in coding testing and debugging This is a hybrid position Expectation of days in office will be confirmed by your hiring manager Qualifications Basic Qualifications Bachelor s degree with 10-14 years of relevant professional experience Preferred Qualifications Extensive hands-on experience with observability tools such as Splunk ClickHouse Grafana Prometheus M3DB OpenTelemetry Fluent Bit ElasticSearch OpenSearch and CloudWatch Proven ability to set up and manage exporters e g Node Exporter Cert Exporter and others for metrics collection Deep experience with containerization and orchestration platforms including Docker and Kubernetes Strong background in CI CD pipeline management using tools such as GitHub and Ansible Proficiency with Infrastructure as Code IaC technologies such as Terraform and configuration management practices like GitOps Advanced scripting skills in Python and or Shell within Linux environments experience with Unix scripting Working knowledge of query languages such as PromQL MS SQL or Splunk SPL is highly desirable Cloud certifications in AWS or GCP are a significant advantage Demonstrated ability to analyze complex technical problems and solutions and to communicate effectively at the appropriate level of detail with both technical and non-technical stakeholders Exceptional communication collaboration and leadership skills with a proven track record of leading and mentoring technical teams Strong organizational and problem-solving abilities with an aptitude for driving process improvements and operational excellence Additional Information Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law

Site Reliability Engineer

3 weeks ago

Bangalore, Karnataka, India NatWest Group Full time

Join us as a Site Reliability Engineer In this key role you ll support the improvement of non-functional and operational characteristics such as availability performance efficiency change management monitoring security incident response and capacity planning of our products and services You ll enjoy significant stakeholder interaction working in...
Senior Site Reliability Engineer

4 weeks ago

Bangalore, Karnataka, India Akamai Full time

Job Category Site Reliability Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our critical Platform and Reliability Engineering Team The Platform Reliability Engineering team defines measures and optimizes key performance indicators for Akamai s global network This...
Site Reliability Engineer

2 weeks ago

bangalore, India IntraEdge Full time

Job Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management . The ideal candidate will be responsible for maintaining the reliability, performance, and...
Site reliability engineer

4 weeks ago

Bangalore, India ViewSonic Full time

Job Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. 3+ year of experience in a relevant role, such as Site Reliability Engineer, Dev Ops Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions including EC2, S3, Cloud Watch, Lambda, and RDS. Interest and understanding of Platform...
Site Reliability Engineer

1 week ago

Bangalore, India CodeKarma Full time

Site Reliability Engineer (Multi-Cloud Deployments) Location: Bangalore / Remote Experience: 4–10 years Type: Full-time (6-month probation) About CodeKarma CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s workflow. Our platform runs both as SaaS and as sub-
Site reliability engineer

4 weeks ago

Bangalore, India Synechron Full time

We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to...
Site Reliability Engineer

4 weeks ago

Bangalore, India Synechron Full time

We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to...
Site Reliability Engineer

5 days ago

bangalore district, India IntraEdge Full time

Job Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management . The ideal candidate will be responsible for maintaining the reliability, performance, and...
Site Reliability Engineer

2 weeks ago

bangalore, India Resource Algorithm Full time

Senior SRE (Engineering & Reliability) Job Summary: We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SeniorSRE, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...
Site Reliability Engineer

2 weeks ago

bangalore, India Trantor Full time

Job Title - Site Reliability EngineerRole- Contract (9 Months- Extendable)Exp- 5+ yearsLoc- Bangalore ( Hybrid)Notice- Immediate joiner onlyDuties:Responsible for maintaining and scaling production services and servers across multiple datacenters for complex and data-intensive cloud services Improve scalability, service reliability,capacity, and performance...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer