MLOps Site Reliability Engineer
2 weeks ago
Company Overview
KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world's leading technology providers to accelerate the delivery of tomorrow's electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.
Group/Division
With over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA's metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. Central Engineering is KLA's largest engineering organization comprised of 9 Centers-of-Excellence (CoE) in various disciplines applied across all product groups in the company. These CoE include Handling & Automation, Precision Motion Control, Sensors & Image Acquisition, Platform Design, and Packaging Engineering, among others. Talent includes over 500 engineers across global centers in Israel, China, India, and the US. Each CoE contributes not just talent and deliverables per discipline toward product programs, but also subject matter expertise, best practices, roadmaps, specialized facilities, apparatus, models, and analytics. These differentiate KLA not only in WHAT we do, but also in HOW we do it.
Job Description/Preferred Qualifications
We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.
Responsibilities:
- Design, implement, and maintain scalable and reliable machine learning infrastructure.
- Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production.
- Develop and maintain CI/CD pipelines for machine learning workflows.
- Monitor and optimize the performance of machine learning systems and infrastructure.
- Implement and manage automated testing and validation processes for machine learning models.
- Ensure the security and compliance of machine learning systems and data.
- Troubleshoot and resolve issues related to machine learning infrastructure and workflows.
- Document processes, procedures, and best practices for machine learning operations.
- Stay up-to-date with the latest developments in MLOps and related technologies.
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Site Reliability Engineer (SRE) or in a similar role.
- Strong knowledge of machine learning concepts and workflows.
- Proficiency in programming languages such as Python, Java, or Go.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Familiarity with containerization technologies like Docker and Kubernetes.
- Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI.
- Strong problem-solving skills and the ability to troubleshoot complex issues.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Master's degree in Computer Science, Engineering, or a related field.
- Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.
- Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow.
- Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.
- Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.
- Experience with automated testing frameworks for machine learning models.
- Knowledge of security best practices for machine learning systems and data.
Minimum Qualifications
Master's / Bachelor's Level Degree and related work experience of 2 years
We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees.
KLA is proud to be an equal opportunity employer
Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA's Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.
-
Cloud Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
MLOPS Engineer
6 days ago
Chennai, Tamil Nadu, India Cognizant Technology Solutions Full time ₹ 20,00,000 - ₹ 25,00,000 per yearRole : MLOps EngineerLocation : HyderabadMode of Interview - In Person - Hyderabad Data - 11th Oct 2025 (Saturday)ResponsibilitiesModel Deployment, Model Monitoring, Model RetrainingDeployment pipeline, Inference pipeline, Monitoring pipeline, Retraining pipelineDrift Detection, Data Drift, Model DriftExperiment TrackingMLOps ArchitectureREST API...
-
MLOps Engineer
2 days ago
Chennai, Tamil Nadu, India Unicorn Workforce Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJob Title:MLOps EngineerLocation:Chennai/HyderabadExperience:5 – 12 YearsNotice Period:Immediate to 30 DaysJob DescriptionWe are seeking an experiencedMLOps Engineerto design, implement, and maintain scalable machine learning pipelines and infrastructure. The role requires a strong understanding of ML lifecycle management, CI/CD for ML models, and cloud...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
4 weeks ago
Chennai, Tamil Nadu, India Concord Full timeSRE Sr. Engineers (Individual Contributors)Key Attributes:Strong SRE (Site Reliability Engineering) experienceDevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc.Excellent troubleshooting and debugging skills (infrastructure + application level)Perseverance – must push through complex/challenging issues without giving upAble to...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per yearRole Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Grootan Technologies Full time ₹ 15,00,000 - ₹ 25,00,000 per yearAbout the RoleWe are seeking a skilled Site Reliability Engineer (SRE) with 4 to 5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJoin us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...
-
MLOps Engineer
1 week ago
Chennai, Tamil Nadu, India iXceed Solutions Full time ₹ 1,04,000 - ₹ 1,30,878 per yearRole - MLOps EngineerLocation - Chennai (Onsite)Type - PermanentJob Description:• University Degree in Computer Science, Information Technology, or related field•5+ years of experience in the Machine Learning Operations role• Design the data pipelines and engineering infrastructure to support our clients' enterprise machine learning systems at scale•...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Compunnel Full time ₹ 2,00,000 - ₹ 4,00,000 per yearJob Title: Site Reliability Engineer (SRE)Work Location: Chennai (Work from Office)Compensation: 30 LPAInterview Process: Final Round Face-to-Face Discussion follwed by Virtual round of interviewRequirements5-8 years of experience as an SRE/DevOps Engineer/Backend Engineer with SRE focus.Strong Python scripting and automation skills.Proven API integration...