
MLOps Site Reliability Engineer
2 weeks ago
Description
/Preferred QualificationsWe are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.
Responsibilities:
Design, implement, and maintain scalable and reliable machine learning infrastructure. Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production. Develop and maintain CI/CD pipelines for machine learning workflows. Monitor and optimize the performance of machine learning systems and infrastructure. Implement and manage automated testing and validation processes for machine learning models. Ensure the security and compliance of machine learning systems and data. Troubleshoot and resolve issues related to machine learning infrastructure and workflows. Document processes, procedures, and best practices for machine learning operations. Stay up-to-date with the latest developments in MLOps and related technologies.Required Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related field. Proven experience as a Site Reliability Engineer (SRE) or in a similar role. Strong knowledge of machine learning concepts and workflows. Proficiency in programming languages such as Python, Java, or Go. Experience with cloud platforms such as AWS, Azure, or Google Cloud. Familiarity with containerization technologies like Docker and Kubernetes. Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI. Strong problem-solving skills and the ability to troubleshoot complex issues. Excellent communication and collaboration skills.Preferred Qualifications:
Master's degree in Computer Science, Engineering, or a related field. Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn. Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow. Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible. Experience with automated testing frameworks for machine learning models. Knowledge of security best practices for machine learning systems and data.Minimum Qualifications
Master's / Bachelor's Level Degree and related work experience of 2 years
Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.
-
MLOps Site Reliability Engineer
2 weeks ago
Chennai, India KLA Full timeDescription /Preferred QualificationsWe are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and...
-
MLOps Site Reliability Engineer
4 weeks ago
Chennai, India KLA Full timeCompany OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...
-
MLOps Site Reliability Engineer
4 days ago
Chennai, Tamil Nadu, India KLA Full time ₹ 10,00,000 - ₹ 25,00,000 per yearCompany OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...
-
▷ Immediate Start: Mlops Site Reliability Engineer
19 hours ago
Chennai, Tamil Nadu, India KLA Corporation Full timeCompany Overview KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem Virtually every electronic device in the world is produced using our technologies No laptop smartphone wearable device voice-controlled gadget flexible screen VR device or smart car would have made it into your hands without us KLA invents systems...
-
Site Reliability Engineer
2 days ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per yearHiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...
-
Site reliability engineer
4 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE) Experience: 4 – 10 Years Location: Chennai (Hybrid – 2 days in office) Role Overview: We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services. Key...
-
Site Reliability Engineer
4 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key Responsibilities- Design,...
-
Site Reliability Engineer
3 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key ResponsibilitiesDesign,...
-
Site Reliability Engineer
4 weeks ago
Chennai, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE) Experience: 4 – 10 Years Location: Chennai (Hybrid – 2 days in office) Role Overview: We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services. Key Responsibilities ...
-
Site reliability engineer
3 weeks ago
Chennai, India Zyoin Group Full timeJob DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...