Lead Site Reliability Engineer

4 weeks ago


Hyderabad, Telangana, India TwinPacs Sdn Bhd Full time

We have an exclusive job role as below :

Lead Site Reliability Engineer (SRE) | Hyderabad (WFO) | 8-12 Yrs |

Scope Of Work :

- Architect, design, and deploy end-to-end infrastructure solutions for a multi-tenant microservices-based SaaS application with a focus on AI/ML model integration.

- Ensure system reliability, scalability, performance, and security, specifically enhancing AI/ML processing pipelines and workflows.

- Utilize Terraform scripting for on-demand environment provisioning within the AWS cloud, optimized for AI/ML workloads.

- Implement and refine monitoring and alerting systems across application, network, and OS layers to support AI model operations and data processing.

- Diagnose, support, and resolve production issues and alerts, participating in a 24/7 on-call rotation to maintain seamless AI/ML service operations.

Experience Required :

- 8+ years of experience in Site Reliability Engineering (SRE) and DevOps roles with a track record of managing large-scale enterprise SaaS services in production, including 1+ year in AI/ML infrastructure.

- Demonstrated expertise with AWS public cloud technologies, including extensive experience in deploying and managing large-scale container clusters using AWS, EKS.

- Skilled in Infrastructure as Code (IaC) using Terraform, and container technologies such as Docker and Kubernetes.

- Proficient in scripting and programming for automation (Python, Bash, etc.), with strong Linux OS and networking fundamentals relevant to AI/ML workloads.

- Experience in establishing monitoring systems to ensure high availability, performance, and security integrity, using tools like ELK Stack, CloudWatch, and others tailored for AI/ML monitoring.

- Hands-on experience managing microservices architecture SaaS products, enabling RESTful web services, SSO integration (Okta, Auth0), and utilizing cloud databases like EC2-RDS, MySQL, and Elasticsearch, especially in AI/ML deployments.

- Proficient in backup and disaster recovery strategies specific to AI/ML data resources like RDS and Elasticsearch.

- AWS Certified Solutions Architect is strongly preferred.

- Self-driven, proactive, and adaptable to thrive in an early-stage startup environment, with a keen interest in integrating AI/ML technologies into modern SaaS solutions.

(ref:hirist.tech)

  • Hyderabad, Telangana, India CyberArk Full time

    What You'll Do\You will lead and manage a team of Site Reliability Engineers, focusing on ensuring the reliability, performance, and scalability of CyberArk's saas services and AWS infrastructure. This role involves a combination of technical expertise, leadership, and collaboration to meet the organization's reliability and availability...


  • Hyderabad, Telangana, India Spydra Full time

    Site Reliability Engineer (SRE). Position Overview : We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our team. The SRE will bridge the gap between software development and IT operations, ensuring the reliability, scalability, and performance of our systems. The ideal candidate will be responsible for implementing and...


  • Hyderabad, Telangana, India Spydra Full time

    Job DescriptionSite Reliability Engineer (SRE)Position OverviewWe are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our team. The SRE will bridge the gap between software development and IT operations, ensuring the reliability, scalability, and performance of our systems. The ideal candidate will be responsible for...


  • Hyderabad, Telangana, India Bright Vision Technologies Full time

    Bright Vision Technologies has an immediate Full-time opportunity for Site Reliability Engineer (SRE)  to join our team in USA.Job Role:  Site Reliability Engineer (SRE)  Job Location : USA Bright Vision Technologies Your Path to a Successful Career in the U.S.Starts Here Now Accepting Applications for the H1B 2025 quotaAt Bright Vision Technologies, we...


  • Hyderabad, Telangana, India Spydra Full time

    Site Reliability Engineer (SRE). Position Overview : We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our team. The SRE will bridge the gap between software development and IT operations, ensuring the reliability, scalability, and performance of our systems. The ideal candidate will be responsible for implementing and...


  • Hyderabad, Telangana, India ValueLabs Full time

    Dear Aspirants,We at ValueLabs have an Opening for Senior Site Reliability Engineer role. Below is the JD for the same..Role : Senior Site Reliability EngineerOverall Experience: 7+ YearsPreferable Immediate-15 days JoinersKey Responsibilities:We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for ensuring the...


  • Hyderabad, Telangana, India ValueLabs Full time

    Dear Aspirants, We at ValueLabs have an Opening for Senior Site Reliability Engineer role. Below is the JD for the same.. Role : Senior Site Reliability Engineer Overall Experience: 7+ Years Preferable Immediate-15 days Joiners Key Responsibilities: We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for ensuring...


  • Hyderabad, Telangana, India ValueLabs Full time

    Dear Aspirants,We at ValueLabs have an Opening for Senior Site Reliability Engineer role. Below is the JD for the same..Role : Senior Site Reliability EngineerOverall Experience: 7+ YearsPreferable Immediate-15 days JoinersKey Responsibilities:We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for ensuring the...


  • Hyderabad, Telangana, India FactSet Full time

    At FactSet, we deliver superior content, analytics, and flexible technology to help professionals see and seize opportunity sooner.We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems...


  • Hyderabad, Telangana, India Zenoti Full time

    **About Us:**Zenoti is a leading provider of cloud-based software solutions for the beauty and wellness industry.**Job Summary:**We are seeking a highly skilled Site Reliability Engineering Manager to join our team. The ideal candidate will have 3+ years of experience in cloud security, site reliability engineering, or a related role.Responsibilities:Design,...


  • Hyderabad, Telangana, India Bright Vision Technologies Full time

    Bright Vision Technologies has an immediate Full-time opportunity for Site Reliability Engineer to join our team in United States.Job Role: Site Reliability Engineer Job Location: USA Bright Vision Technologies Your Path to a Successful Career in the U.S.Starts Here Now Accepting Applications for the H1B 2025 QuotaAt Bright Vision Technologies, we...


  • Hyderabad, Telangana, India Ample Enterprise Technologies Pvt. Ltd. Full time

    About UsAmple Enterprise Technologies Pvt. Ltd. is dedicated to shaping the future of application and product development through cutting-edge artificial intelligence and advanced technologies.Our expertise enables organizations to unlock new levels of efficiency, scalability, and market relevance by specializing in AI-driven insights and next-generation...


  • Hyderabad, Telangana, India FactSet Full time

    About Factset : FactSet Research Systems Inc. is a global provider of integrated financial information, analytical applications and industry-leading services for investment and corporate communities. As a publicly traded company (NYSE : FDS | NASDAQ : FDS) recently added to the S&P 500 index, FactSet delivers superior content, analytics, and flexible...


  • Hyderabad, Telangana, India Technocratic Solutions Full time

    We are hiring on behalf of a leading global IT company , renowned for its cutting-edge technology solutions and large-scale infrastructure. This is an exciting opportunity to work with a top-tier organization that values innovation, reliability, and automation in cloud infrastructure. Role Overview: As a Site Reliability Engineer (SRE) , you will be...


  • Hyderabad, Telangana, India FedEx Full time

    About FedEx: Located in Hyderabad, India, FedEx ACC India serves as a strategic technology division for FedEx that will focus on developing innovative solutions for our customers and team members across the globe. These solutions will enhance productivity, minimize expenses, and update our technology infrastructure to continue providing the outstanding...


  • Hyderabad, Telangana, India NationsBenefits Full time

    Position Overview: The Site Reliability Engineering (SRE) team plays a critical role in maintaining the health, performance, and availability of our platforms. As an L2 SRE , you will monitor and respond to site performance metrics, manage incidents, and work closely with Development, , and Engineering teams to ensure the continuous reliability of our...


  • Hyderabad, Telangana, India Zenoti Full time

    Job DescriptionJob description- Zenoti provides an all-in-one, cloud-based software solution for the beauty and wellness industry. Our solution allows users to seamlessly manage every aspect of the business in a comprehensive mobile solution: online appointment bookings, POS, CRM, employee management, inventory management, built-in marketing programs and...


  • Hyderabad, Telangana, India Coforge Full time

    Job Title:Site Reliability EngineerSkills : SRE, CI/CD, AWS, Python, Terraform & KubernetesLocation:Hyderabad (Work from Office)Experience:6-14 YearsNote:Immediate joiners are preferableJob Description:We at Coforge are hiring a Site Reliability Engineer with the following skillset:Design, implement, and manage scalable and secure cloud-based infrastructure...


  • Hyderabad, Telangana, India Hirelo Full time

    Site Reliability Engineer (SRE) We are seeking a highly motivated and experienced L2 Site Reliability Engineer (SRE) to join our growing team. As an L2 SRE, you will be a critical member of our operations team, responsible for ensuring the availability, performance, and scalability of our mission-critical applications and infrastructure. Your expertise in...


  • Hyderabad, Telangana, India ValueLabs Full time

    Experienced in SRE or Site Reliability Engineer Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure Dev Ops.Collaborate with cross-functional teams to optimize system performance, reliability, and scalability.Develop and maintain tools for continuous integration, continuous deployment (CI/CD), and...