Principal Site Reliability Engineer

6 days ago


Bangalore City, India Zscaler Full time
Job Description:We are seeking an experienced SRE - AI/ML Engineer with a strong background in machine learning, AI-driven solutions, and cloud/SRE operations. In this role, you will design, implement, and optimize machine learning models to enhance system observability, proactive monitoring, and root cause analysis. You will collaborate with various teams to integrate AI/ML capabilities into our tooling, monitoring platforms, and decision-making processes to build a smarter, more resilient infrastructure.Key ResponsibilitiesAI/ML Model Design and Implementation: Design, implement, and manage machine learning models for predicting system failures, performance degradation, and anomalies, enabling proactive and intelligent monitoring.AI-Driven Observability: Integrate AI/ML capabilities into existing observability tools (e.G., Prometheus, Grafana, ELK Stack) to provide actionable insights, enhance anomaly detection, and enable automated responses.Root Cause Analysis Tools: Develop AI-driven tools for intelligent debugging and root cause analysis by analyzing large volumes of logs, metrics, and traces to quickly identify issues.Generative AI Integration: Implement generative AI models (e.G., GPT, BERT) for content generation, anomaly detection, and scenario simulations to enhance system reliability.Chatbots and Conversational AI: Design and build conversational AI interfaces and chatbots to provide automated insights, support, and interactions for internal teams and users.Collaboration with Data Engineering: Work closely with data engineering teams to collect, preprocess, and manage telemetry data for model training, validation, and real-time monitoring.Model Deployment and Monitoring: Deploy machine learning models into production environments, ensuring their performance, accuracy, and reliability through continuous monitoring.Automation and Innovation: Identify opportunities to automate incident detection and resolution mechanisms using AI/ML technologies, reducing downtime and improving system resilience.Continuous Learning and Improvement: Stay updated with the latest trends in AI/ML and apply them to enhance observability and monitoring solutions.Documentation and Knowledge Sharing: Maintain comprehensive documentation of AI/ML models, tools, and best practices to ensure knowledge sharing and compliance.Required Skills and QualificationsExperience: 8+ years of experience in SRE, AI/ML Engineering, Cloud-SRE/DevOps, with a focus on machine learning and Proactive and Intelligent Operations.AI/ML Expertise: Proficiency in developing, training, and deploying machine learning models using frameworks such as TensorFlow, PyTorch, or Scikit-learn.NLP and Generative AI: Experience with natural language processing (NLP) techniques and generative AI models (e.G., GPT, BERT) for content generation and chatbot development.Programming: Strong programming skills in Python, Java, or Scala for AI/ML development and integration.Observability Tools: Familiarity with observability tools like Prometheus, Grafana, and the ELK Stack, and experience enhancing them with AI/ML capabilities.Data Processing: Knowledge of big data technologies (e.G., Apache Kafka, Apache Spark) and real-time data processing concepts.Cloud Platforms: Experience with cloud platforms such as AWS, Azure, or Google Cloud for deploying and managing AI/ML solutions.Analytical Skills: Strong analytical skills to interpret complex data, identify patterns, and solve technical challenges.Collaboration: Excellent verbal and written communication skills for effective collaboration with cross-functional teams and stakeholders.Why Zscaler?-

Work with cutting-edge technology in a dynamic and innovative environment.-

Enjoy a collaborative and inclusive company culture.-

Benefit from a competitive compensation and benefits package.-

Take advantage of our commitment to continuous learning and professional development.-

Join the world’s leading Software-as-a-Service Security Platform.-

Contribute to detecting 100 million threats daily across 185+ countries.-

Experience our exceptional workplace with a Glassdoor rating of 4.7/5.0 and 98% CEO approval.

  • bangalore, India Cisco Full time

    Principal Site Reliability Engineer, CloudOps at Cisco ThousandEyesBangalore, India Principal Site Reliability Engineer, Cloudops at Cisco ThousandEyes Who We Are The name ThousandEyes was born from two big ideas: the power to see things not ordinarily possible and the ability to collect insights from a multitude of vantage points. As the...


  • bangalore, India Oracle Full time

    Job title : Principal Site Reliability Engineering Job Description  Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Health while building out a complete platform supporting modernized, automated...


  • bangalore, India Oracle Full time

    Job title : Principal Site Reliability Engineering Job Description  Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Health while building out a complete platform supporting modernized, automated...


  • Bangalore City, India Zensar Technologies Full time

    About the Role: Site Reliability EngineerExperience: 5-8YrsLocation: BangaloreRequired Skills:Must have skills: High level of experience using cloud log management and monitoring data platforms (Dynatrace, Azure Monitor)Hands on experience in Azure Bicep Experience working with Infrastructure as Code and Containerization tools (Terraform, Docker, Kubernetes,...


  • Bangalore City, India Zensar Technologies Full time

    About the Role: Site Reliability EngineerExperience: 5-8YrsLocation: BangaloreRequired Skills:Must have skills: -High level of experience using cloud log management and monitoring data platforms (Dynatrace, Azure Monitor)Hands on experience in Azure Bicep Experience working with Infrastructure as Code and Containerization tools (Terraform, Docker,...


  • Bangalore City, India BayOne Solutions Full time

    ResponsibilitiesTo ensure the reliability, availability and performance of customer’s production systems. Monitor system health, identify issues and implement solutions to prevent and resolve incidentsFor responding to incidents, perform root cause analysis and work with functional teams, 3rd party vendors to implement corrective actionsFor monitoring,...


  • bangalore, India CAPCO Full time

    Principal Consultant - Site Reliability Engineer at Capco India - Bengaluru Sr SRE Consultant Joining Capco means joining an organisation that is committed to an inclusive working environment where you’re encouraged to #BeYourselfAtWork. We celebrate individuality and recognize that diversity and inclusion, in all forms, is critical to...


  • bangalore, India CAPCO Full time

    Principal Consultant - Site Reliability Engineer at Capco India - Bengaluru Sr SRE Consultant Joining Capco means joining an organisation that is committed to an inclusive working environment where you’re encouraged to #BeYourselfAtWork. We celebrate individuality and recognize that diversity and inclusion, in all forms, is critical to success....


  • Bangalore City, India Wipro Full time

    Urgent Requirement for SRE (Site Reliability Engineer) with Wipro - Bangalore, Hyderabad, Chennai & Pune!Mandatory Skill Required :Proven experience (5+ years) working as an SRE with a specific focus on Microsoft Azure Cloud services and OCI Deep understanding of Cloud services, including Docker and Kubernetes Service API and tooling in Azure and OCI....


  • Bangalore City, India NetConnectGlobal Full time

    Location: BangaloreExperience: 5 - 10 YearsJob Description:We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our team responsible for maintaining and improving our 24/7 Supercomputer as a Service (SaaS) infrastructure. The ideal candidate will have a strong background in high-performance computing (HPC) environments, system...


  • Bangalore City, India Tyson Foods India Full time

    Job Description – Lead Site Reliability Engineer (Cloud Engineering) The role as Lead Site Reliability Engineer in the Data & Analytics organization, is to lead efforts in ensuring the reliability, scalability, and performance of our cloud-based systems in like GCP/AWS. The role will play a crucial part in designing and implementing robust, scalable...


  • Bangalore City, India Saarthee Full time

    Company DescriptionAbout Saarthee:Saarthee is a global data, analytics, technology and consulting firm unlike any other, where our passion for helping others fuels our approach and our products and solutions. We are a one-stop shop for all things data and analytics. Unlike other analytics consulting firms that are technology or platform specific,...


  • Bangalore City, India Saarthee Full time

    Company DescriptionAbout Saarthee:Saarthee is a global data, analytics, technology and consulting firm unlike any other, where our passion for helping others fuels our approach and our products and solutions. We are a one-stop shop for all things data and analytics. Unlike other analytics consulting firms that are technology or platform specific,...


  • bangalore, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 3 - 5 yearsResponsibilities:● Design,...


  • bangalore, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 4 - 5 yearsResponsibilities:● Design,...


  • bangalore, India Cricbuzz.com Full time

    Site Reliability Engineer We are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services. Experience - 4 - 5 years Responsibilities: ●...


  • bangalore, India tsworks Full time

    Who We Are tsworks Technologies India Private Limited (subsidiary of The Software Works, Inc, USA) is a technology product and services company. Our mission is to provide domain expertise, innovative solutions and thought leadership to empower businesses to thrive in a digital world. We value our employees, take pride in providing best value in customer...


  • bangalore, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • bangalore, India Microsoft Full time

    Overview Looking to join an exciting industry and organization at the forefront of the next Tech industry transformation? Are you ready to join a team of the world’s best technical experts to enable the success of Microsoft solutions for our commercial & enterprise customers? We are seeking to build out the team of next generation Site Reliability...


  • bangalore, India Microsoft Full time

    Overview Looking to join an exciting industry and organization at the forefront of the next Tech industry transformation? Are you ready to join a team of the world’s best technical experts to enable the success of Microsoft solutions for our commercial & enterprise customers? We are seeking to build out the team of next generation Site Reliability...