
Site Reliability Engineer
17 hours ago
we have open requirement for "SRE LEAD Engineer"
client: MNC.
PRODUSCT BASE US COMPANY
Role & responsibilities
Responsibilities:
- Architect, design, and deploy end-to-end infrastructure solutions for a multi-tenant
microservices-based SaaS application with a focus on AI/ML model integration.
- Ensure system reliability, scalability, performance, and security, specifically enhancing
AI/ML processing pipelines and workflows.
- Utilize Terraform scripting for on-demand environment provisioning within the AWS
cloud, optimized for AI/ML workloads.
- Implement and refine monitoring and alerting systems across application, network, and
OS layers to support AI model operations and data processing.
- Diagnose, support, and resolve production issues and alerts, participating in a 24/7
on-call rotation to maintain seamless AI/ML service operations.
Qualifications :
- 8+ years of experience in Site Reliability Engineering (SRE) and DevOps roles with a
track record of managing large-scale enterprise SaaS services in production, including
1+ year in AI/ML infrastructure.
- Demonstrated expertise with AWS public cloud technologies, including extensive
experience in deploying and managing large-scale container clusters using AWS, EKS.
Skilled in Infrastructure as Code (IaC) using Terraform, and container technologies such
as Docker and Kubernetes.
- Proficient in scripting and programming for automation (Python, Bash, etc.), with strong
Linux OS and networking fundamentals relevant to AI/ML workloads.
Job Description:
- Experience in establishing monitoring systems to ensure high availability, performance,
and security integrity, using tools like ELK Stack, CloudWatch, and others tailored for
AI/ML monitoring.
- Hands-on experience managing microservices architecture SaaS products, enabling
RESTful web services, SSO integration (Okta, Auth0), and utilizing cloud databases like
EC2-RDS, MySQL, and Elasticsearch, especially in AI/ML deployments.
- Proficient in backup and disaster recovery strategies specific to AI/ML data resources
like RDS and Elasticsearch.
- AWS Certified Solutions Architect is strongly preferred.
- Self-driven, proactive, and adaptable to thrive in an early-stage startup environment, with
a keen interest in integrating AI/ML technologies into modern SaaS solutions.
Preferred candidate profile
If interested candidates please share the your profiles to .AI
NP: Immediate to 30 days
loc:HYD
-
Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India Talent Worx Full time ₹ 9,00,000 - ₹ 12,00,000 per yearSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
16 hours ago
Hyderabad, Telangana, India Talent Worx Full time US$ 1,20,000 - US$ 2,00,000 per yearTalent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...
-
Site Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India Talent Worx Full timeTalent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...
-
Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...
-
Site Reliability Engineer
6 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...
-
Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability Engineer Experience: 7+ Years Location: Hyderabad Skills for Principal: ~ Strong leadership and people management skills. ~ Exceptional technical proficiency in Pearson's technology stack. ~ Advanced project management capabilities. ~ Excellent communication and collaboration skills. ~ Adept at risk assessment and crisis management. ~...
-
Site Reliability Engineer
18 hours ago
Hyderabad, Telangana, India IntraEdge Full time US$ 1,20,000 - US$ 2,00,000 per yearSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis management.Strategic thinking with a...
-
Site Reliability Engineer
4 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability Engineer Experience: 7+ Years Location: Hyderabad Hybrid 4-day office and 1 Day remote Skills for Principal: Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Advanced project management capabilities. Excellent communication and collaboration skills. Adept at risk assessment...
-
Site Reliability Engineer
1 day ago
Hyderabad, Telangana, India ServiceNow Full timeSite Reliability Engineer (SRE)Experience : 6+ YearsAbout the Role : We are seeking a seasoned SRE to ensure the reliability, availability, and performance of our critical services. You will combine software engineering with systems administration to create scalable and highly reliable software systems.Responsibilities : - Design, build, and maintain...
-
SRE(Site Reliability Engineer)
7 days ago
Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 20,00,000 per yearSRE (Site Reliability Engineer)Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services. Your work will involve both software engineering and systems operations as you strive to improve...