MLOps Site Reliability Engineer
1 day ago
Company OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world's leading technology providers to accelerate the delivery of tomorrow's electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us Group/DivisionWith over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA's metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. Central Engineering is KLA's largest engineering organization comprised of 9 Centers-of-Excellence (CoE) in various disciplines applied across all product groups in the company. These CoE include Handling & Automation, Precision Motion Control, Sensors & Image Acquisition, Platform Design, and Packaging Engineering, among others. Talent includes over 500 engineers across global centers in Israel, China, India, and the US. Each CoE contributes not just talent and deliverables per discipline toward product programs, but also subject matter expertise, best practices, roadmaps, specialized facilities, apparatus, models, and analytics. These differentiate KLA not only in WHAT we do, but also in HOW we do it.Job Description/Preferred QualificationsWe are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.Responsibilities:Design, implement, and maintain scalable and reliable machine learning infrastructure.Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production.Develop and maintain CI/CD pipelines for machine learning workflows.Monitor and optimize the performance of machine learning systems and infrastructure.Implement and manage automated testing and validation processes for machine learning models.Ensure the security and compliance of machine learning systems and data.Troubleshoot and resolve issues related to machine learning infrastructure and workflows.Document processes, procedures, and best practices for machine learning operations.Stay up-to-date with the latest developments in MLOps and related technologies.Required Qualifications:Bachelor's degree in Computer Science, Engineering, or a related field.Proven experience as a Site Reliability Engineer (SRE) or in a similar role.Strong knowledge of machine learning concepts and workflows.Proficiency in programming languages such as Python, Java, or Go.Experience with cloud platforms such as AWS, Azure, or Google Cloud.Familiarity with containerization technologies like Docker and Kubernetes.Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI.Strong problem-solving skills and the ability to troubleshoot complex issues.Excellent communication and collaboration skills.Preferred Qualifications:Master's degree in Computer Science, Engineering, or a related field.Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow.Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.Experience with automated testing frameworks for machine learning models.Knowledge of security best practices for machine learning systems and data.Minimum QualificationsMaster's / Bachelor's Level Degree and related work experience of 2 yearsWe offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees. KLA is proud to be an equal opportunity employerBe aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA's Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.
-
MLOps Site Reliability Engineer
3 days ago
IND-Tamil Nadu-Chennai-KLA, India KLA Full time ₹ 12,00,000 - ₹ 36,00,000 per yearCompany OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...
-
Chennai, Tamil Nadu, India KLA Corporation Full timeCompany Overview KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem Virtually every electronic device in the world is produced using our technologies No laptop smartphone wearable device voice-controlled gadget flexible screen VR device or smart car would have made it into your hands without us KLA invents systems...
-
Site Reliability Engineer
1 day ago
tamil nadu, India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – Azure & AIExperience: 7+ yearsWork Mode: HybridWork Location: Chennai/Mumbai/GurgaonJob Summary:We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AI infrastructure, and automation. The ideal candidate will have a solid background in managing cloud...
-
AWS Site Reliability Engineer
1 week ago
tamil nadu, India HTC Global Services Full timeHTC – A brief profileEstablished in 1990, HTC Inc., a company with headquarters in Troy, Michigan, is a leading global Information Technology solution and BPO provider. HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data warehousing, embedded systems, ECM, SCM, CRM, and ERP solutions. HTC Inc....
-
Site Reliability Engineer Trainer
1 week ago
Medavakkam, Chennai, Tamil Nadu, India Intellion Technologies Pvt Ltd Full time ₹ 2,40,000 - ₹ 18,00,000 per yearJob Title: Site Reliability Engineer Trainer (Part-Time / Freelance)Job Description:We are looking for an experienced Site Reliability Engineer (SRE) Trainer for a part-time freelance role. The trainer will be responsible for delivering practical and interactive sessions to learners, covering key concepts and hands-on aspects of Site Reliability...
-
Senior Mlops
3 weeks ago
Chennai, Tamil Nadu, India NTT DATA Full timeReq ID 343712 NTT DATA strives to hire exceptional innovative and passionate individuals who want to grow with us If you want to be part of an inclusive adaptable and forward-thinking organization apply now We are currently seeking a Senior MLOps AIOps Platform Engineer to join our team in Chennai Tamil N du IN-TN India IN Job Summary We are seeking a Senior...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, , India Insent Full time ₹ 6,00,000 - ₹ 18,00,000 per yearWe are looking to hire a site reliability engineer to our super fast -growing team. As a site reliability engineer, you will be responsible for deploying, supporting, monitoring and troubleshooting large scale micro -service based system; documenting the IT infrastructure, policies and procedures **About Insent** Insent is a super fast -growing, enterprise...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, , India Quvia Full time ₹ 9,00,000 - ₹ 12,00,000 per yearAbout the role:We are seeking a highly skilled Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our satellite communication systems, which leverage AI and ML for automation and optimization. You will play a key role in maintaining the infrastructure,...
-
Staff Site Reliability Engineer
1 day ago
tamil nadu, India Poshmark Full timeWe’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale. You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through...
-
Senior Site Reliability Engineer
2 days ago
Chennai, Tamil Nadu, India Miratech Full timeCompany Description Miratech helps visionaries change the world We are a global IT services and consulting company that brings together enterprise and start-up innovation Today we support digital transformation for some of the world s largest enterprises By partnering with both large and small players we stay at the leading edge of technology remain nimble...