
Site Reliability Engineer
3 days ago
We are seeking an accomplished Site Reliability Engineer (SRE) Sr Consultant to join our dynamic Observability team. In this senior role, you will provide technical leadership in developing and maintaining reliable, secure, and cost-effective observability solutions that support our global operations.
As the Sr. consultant SRE, you will serve as the strategic bridge between development and operations, ensuring all systems and services are efficient, highly available, resilient, and scalable. You will collaborate closely with software engineers, system administrators, and cross-functional stakeholders to drive automation, optimize performance, and enable seamless application delivery.
You will take end-to-end ownership of critical observability initiatives, with a strong focus on availability, performance, security, and reliability. You will lead the design and implementation of robust monitoring, alerting, and automation frameworks to minimize incidents and accelerate incident resolution. Your leadership will be instrumental in guiding and mentoring the team, ensuring best practices are consistently adopted and operational excellence is maintained.
Key responsibilities include driving continuous improvement across processes, tools, and technologies, leading root cause analysis, and developing preventive measures for production incidents. You will champion a culture of collaboration, innovation, and proactive problem-solving, supporting engineering teams with the technical expertise needed to meet demanding requirements.
As an integral member and leader within our Agile Scrum teams, your technical acumen, leadership skills, and ability to mentor others will be central to delivering impactful, high-quality results.
Responsibilities- Lead SRE and DevOps operations during APAC hours, ensuring alignment with project objectives, delivery timelines, SLAs, and OLAs.
- Act as the primary escalation point for complex technical issues and incidents, driving resolution and communicating status to leadership and stakeholders.
- Provide strategic input and recommendations on SRE and DevOps initiatives to management, supporting roadmap planning and resource allocation.
- Coordinate and manage relationships with multiple stakeholders, both internal and external, across various technology domains.
- Analyze production defects, perform in-depth root cause analysis across code, data, and infrastructure, and champion the implementation of long-term preventative solutions.
- Mentor, guide, and inspire team members through technical leadership, code reviews, pairing, and ongoing knowledge sharing.
- Lead security and compliance efforts by ensuring timely application of security patches, hotfixes, and adherence to cybersecurity best practices.
- Oversee the design, deployment, and continuous improvement of monitoring, alerting, and logging instrumentation, ensuring comprehensive observability.
- Architect and drive the development of automation frameworks to optimize operational efficiency, eliminate manual toil, and streamline system integration.
- Manage and support observability platforms, including Splunk, ClickHouse, Grafana, Prometheus, M3DB, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch.
- Collaborate with development and product teams to design and implement scalable monitoring solutions and support the creation of reliable environments across the SDLC.
- Promote and enforce DevOps and SRE best practices, fostering a culture of automation, reliability, and continuous improvement across the organization.
- Design, implement, and maintain robust CI CD pipelines, enabling rapid, reliable, and automated software delivery.
- Administer, optimize, and scale cloud infrastructure (AWS, GCP) to ensure high availability, performance, and security.
- Lead the adoption and management of infrastructure as code practices using tools such as Terraform, Ansible, or CloudFormation.
- Continuously monitor and analyze system health, proactively identifying and mitigating risks to reliability and performance.
- Oversee deployment and management of containerization and orchestration solutions (Docker, Kubernetes) for modern application delivery.
- Drive incident management processes, including leading post-incident reviews, facilitating blameless postmortems, and implementing actionable improvements.
- Create, maintain, and improve detailed documentation for infrastructure, processes, runbooks, and standard operating procedures.
- Provide advanced technical support and troubleshooting, guiding team members through complex infrastructure and deployment issues.
- Identify, propose, and implement opportunities for process, tooling, and workflow automation to drive operational excellence.
- Lead disaster recovery planning, capacity management, and business continuity initiatives in collaboration with cross-functional teams.
- Evaluate, recommend, and drive the adoption of new technologies, tools, and practices that enhance reliability, scalability, and observability.
- Present technical strategies, incident findings, and project updates to executive leadership and cross-functional stakeholders.
- Foster an inclusive and collaborative team environment, supporting professional growth and the continuous development of SRE best practices.
Visas Observability ecosystem includes over 2,000 platform nodes, utilizing approximately 15 different tools for logging, monitoring, and tracing, alongside 80,000 client agents. The system handles daily log ingestion exceeding 100TB and oversees hundreds of critical applications, supporting vital alerts, dashboards, and reports. To maintain this high level of performance and reliability, we need a Site Reliability Engineer Sr Consultant with comprehensive knowledge and practical experience. This position requires an I6.5-level engineer who can operate independently with minimal supervision.
About Visas PRE Observability TeamVisas Product Reliability Engineering (PRE) Observability team partners with Product Development as well as Operations & Infrastructure teams to build and manage innovative, reliable, scalable, secure, and cost-effective observability platform solutions. We are looking for talented Senior Site Reliability Engineers to join our driven team, with a focus on maximizing system availability, performance, security, and reliability. This dynamic role requires technical leadership, strong problem-solving skills, and expertise in coding, testing, and debugging.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
LocationsJob Location: Bangalore, INDIA
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Programming Full time ₹ 10,00,000 - ₹ 25,00,000 per yearRole - Site Reliability Engineering.Location - BengaluruYears of Expereince - 4+ YearsProfessional & Technical Skills:Must To Have Skills: Proficiency in Site Reliability Engineering.Good To Have Skills: Experience with cloud service providers such as AWS, Azure, or Google Cloud.Strong understanding of CI/CD tools and practices.Experience with container...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India FOSS United Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAll JobsSite Reliability Engineer at ZEISS IndiaSite Reliability EngineerApplyPosted on September 11, 2025ZEISS IndiaKadubeesanahalli, BengaluruFull TImeJob DescriptionZEISS in IndiaZEISS in India is headquartered in Bengaluru and present in the fields of Industrial Quality Solutions, Research Microscopy Solutions, Medical Technology, Vision Care and Sports...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India ViewSonic Full timeJob Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. 3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS. Interest and understanding of Platform...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India ViewSonic Full time ₹ 15,00,000 - ₹ 25,00,000 per yearJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India NatWest Group Full time ₹ 15,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per yearSite Reliability Engineer Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Funic Tech Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJob Title : Site Reliability Engineer (SRE)Experience Required : 7 YearsLocation : Bangalore / ChennaiEmployment Type : Full-TimeWork Mode : OnsiteRole Overview : We are seeking a highly skilled Site Reliability Engineer (SRE) with 7 years of experience to ensure the reliability, scalability, and performance of our systems. The ideal candidate will bring...
-
Site Reliability Engineer
3 days ago
Bengaluru, Karnataka, India PROGRESS SOFTWARE Full time ₹ 6,00,000 - ₹ 12,00,000 per yearJob Description Site Reliability Engineer Hybrid Hyderabad, IndiaBengaluru, India DevOps Apply nowJob Summary We are Progress (Nasdaq: PRGS) - the trusted provider of software that enables our customers to develop, deploy and manage responsible, AI-powered applications and experience with agility and ease. Were proud to have a diverse, global team...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India NatWest Group Full time ₹ 20,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer,VP Join us as a Site Reliability EngineerIn this key role, you'll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India Zetamicron Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Site Reliability Engineer (SRE)About the RoleWe are seeking a highly skilled and proactive Site Reliability Engineer (SRE)to ensure the stability, scalability, and reliability of our platform. The ideal candidate will have strong experience in managing production environments, automating operational processes, and enhancing system performance...