MLOps Observability Engineer

2 weeks ago


Hyderabad, India Intraedge Technologies Ltd. Full time

The MLOps Observability Engineer will design, implement, and maintain the comprehensive monitoring, logging, and tracing solutions for our entire ML platform and production models. This includes building automated systems to detect model decay, data drift, and infrastructure performance issues, ensuring that our AI/ML applications are reliable, scalable, and maintain continuous business value. II. Key Responsibilities :A. MLOps Monitoring and Model Health Design and Implement Model Observability : Define and implement Model Performance Monitoring metrics (e.g., accuracy, precision, recall, RMSE, AUC) and business-impact metrics for deployed ML models. - Data and Concept Drift Detection : Build and automate data quality and validation checks to continuously monitor for data drift, concept drift, and data integrity issues that could degrade model performance. - Explainability and Fairness Monitoring : Implement tools and techniques for model interpretability and model explainability (XAI), tracking feature importance, and monitoring for potential bias or fairness issues in production. - Alerting and Triage : Establish clear, actionable alerting thresholds for model and infrastructure degradation, integrating with incident management workflows for quick triage and resolution. B. Observability Platform and Infrastructure Telemetry Pipeline Development : Design, deploy, and manage robust Observability pipelines to collect, aggregate, and route the three pillars of observability (metrics, logs, and traces) from the ML platform and inference services. - Dashboarding and Visualization : Create insightful and real-time dashboards (SLIs/SLOs) to provide a clear, unified view of the ML system's health, from infrastructure load to model prediction quality. - Infrastructure-as-Code (IaC) for Observability : Use IaC tools to provision and manage the monitoring and logging infrastructure across cloud environments. - Cost Optimization : Monitor telemetry data costs, implementing smart sampling and retention policies to ensure efficient use of observability tools. C. Automation and CI/CD Automated Retraining Triggers : Integrate observability signals (like performance drop or data drift alerts) to automatically trigger the ML pipeline (CI/CD) for model retraining, testing, and redeployment. - Reproducibility and Auditing : Ensure that model monitoring and all MLOps processes are fully reproducible, traceable, and adhere to governance and regulatory standards. - Collaboration and Consultation : Work closely with Data Scientists and ML Engineers to instrument new models for observability from the ground up, educating them on best practices for monitoring and logging. III. Technical Skills and Qualifications A. Programming and Scripting Expert : Proficiency in Python is required, including libraries for data manipulation and ML (e.g., NumPy, Pandas, Scikit-learn). - Strong Shell Scripting (Bash/Zsh) and experience with other languages like Go or Java is a plus. B. MLOps Tools and Frameworks ML Frameworks : Familiarity with TensorFlow, PyTorch, or Scikit-learn to understand how models are built and served. - MLOps Platforms/Tools : Hands-on experience with MLflow, Kubeflow, Data Version Control (DVC), or comparable solutions for experiment tracking and model registry. - Orchestration : Experience with pipeline orchestration tools like Airflow, Kubeflow Pipelines, or Argo Workflows. C. Cloud and Containerization Cloud Platforms : Deep working knowledge of one or more major cloud providers (AWS, GCP, or Azure) and their ML services (e.g., AWS SageMaker, Google AI Platform, Azure ML). Containerization & Orchestration : - Expertise with Docker and Kubernetes for deploying and managing production ML services. - Infrastructure as Code (IaC) : Proficiency with Terraform or Ansible for infrastructure automation. D. Monitoring and Observability Stack Metrics & Time-Series : - Expertise with Prometheus and Grafana for collecting, querying, and visualizing time-series data. - Logging & Tracing : Experience with centralized logging solutions (ELK Stack/Elasticsearch, Loki, Splunk) and distributed tracing tools (Jaeger, Zipkin, OpenTelemetry). - Model Monitoring Tools : Experience with specialized model performance monitoring tools like Evidently AI, Seldon Core, or similar internal/commercial tools. IV. Education and Experience :Education : Bachelors or Masters degree in Computer Science, Software Engineering, Data Science, or a related technical field. Experience : 3+ years of experience in an MLOps, DevOps, SRE, or Observability-focused engineering role, with at least 1-2 years dedicated to production ML systems. Soft Skills : Excellent problem-solving, analytical skills, and strong communication for collaborating effectively with cross-functional teams (Data Science, Software Engineering, Product). (ref:hirist.tech)



  • Hyderabad, Telangana, India INTRAEDGE TECHNOLOGIES PRIVATE LIMITED Full time ₹ 6,00,000 - ₹ 12,00,000 per year

    The MLOps Observability Engineer will design, implement, and maintain the comprehensive monitoring, logging, and tracing solutions for our entire ML platform and production models. This includes building automated systems to detect model decay, data drift, and infrastructure performance issues, ensuring that our AI/ML applications are reliable, scalable, and...

  • MLOps Engineer

    1 week ago


    Hyderabad, India Mastech Digital Full time

    Description : Position Title : ML Ops Engineer 4.Complete onsite.Full-Time role.Shift Timings : Regular.Address : Spire T110, Hyderabad Knowledge City, Madhapur, Hyderabad, Telangana, India, 500081.Job Description : Roles & Responsibilities : - Define the long-term vision and strategy for MLOps initiatives : Set the direction for the organizations MLOps,...

  • MLOps Engineer

    1 week ago


    Hyderabad, Telangana, India Mancer Consulting Services Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role –MLOps EngineerJob Purpose:The Staff MLOps Engineer plays a pivotal role in shaping our MLOps practice within ITG by building and enhancing a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform. This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline...

  • MLOps Engineer

    3 weeks ago


    Hyderabad, India Costco IT Full time

    About Costco Wholesale Costco Wholesale is a multi-billion-dollar global retailer with warehouse club operations in eleven countries. They provide a wide selection of quality merchandise, plus the convenience of specialty departments and exclusive member services, all designed to make shopping a pleasurable experience for their members. About Costco...

  • MLOps Engineer

    2 weeks ago


    Hyderabad, India TECHSOPHY Full time

    Job Opportunity : MLOps Engineer (3+ Years)Location : Hyderabad At Techsophy, we are driving transformation for global enterprises with cutting-edge AI and automation. We are looking for an MLOps Engineer (3+ years experience) who can bridge the gap between Machine Learning and DevOpsbuilding scalable ML pipelines and ensuring models thrive in real-world...

  • MLOps Engineer

    5 days ago


    Hyderabad, Telangana, India Spydra Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    About The RoleWere looking for an MLOps Engineer to build and operate reliable, secure, and scalable ML/LLM infrastructurefrom data ingestion and training pipelines to model serving, monitoring, and continuous improvement.Youll partner with Data Science, Platform, and Security teams to ship models to production with strong SLAs, observability, and cost...

  • Mlops engineer

    2 weeks ago


    Hyderabad, India Mancer Consulting Services Full time

    Role – MLOps EngineerJob Purpose:The Staff MLOps Engineer plays a pivotal role in shaping our MLOps practice within ITG by building and enhancing a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform. This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline...

  • MLOps Engineer

    3 weeks ago


    Hyderabad, India Mancer Consulting Services Full time

    Role – MLOps EngineerJob Purpose:The Staff MLOps Engineer plays a pivotal role in shaping our MLOps practice within ITG by building and enhancing a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform. This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline...

  • MLOps Engineer

    2 weeks ago


    Hyderabad, India Mancer Consulting Services Full time

    Role – MLOps Engineer Job Purpose: The Staff MLOps Engineer plays a pivotal role in shaping our MLOps practice within ITG by building and enhancing a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform. This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline...

  • MLops Engineer

    1 week ago


    Hyderabad, Telangana, India Weekday AI Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    This role is for one of Weekday's clientsMin Experience: 5 yearsLocation: HyderabadJobType: full-timeRequirementsAt Techsophy, we are driving transformation for global enterprises with cutting-edge AI and automation. We are seeking an MLOps Engineer (with 5+ years of experience) who can bridge the gap between Machine Learning and DevOps, building scalable...