MLOps Observability Engineer

3 days ago


Hyderabad, Telangana, India INTRAEDGE TECHNOLOGIES PRIVATE LIMITED Full time ₹ 6,00,000 - ₹ 12,00,000 per year

The MLOps Observability Engineer will design, implement, and maintain the comprehensive monitoring, logging, and tracing solutions for our entire ML platform and production models. This includes building automated systems to detect model decay, data drift, and infrastructure performance issues, ensuring that our AI/ML applications are reliable, scalable, and maintain continuous business value.

II. Key Responsibilities :

A. MLOps Monitoring and Model Health Design and Implement Model Observability : Define and implement Model Performance Monitoring metrics (e.g., accuracy, precision, recall, RMSE, AUC) and business-impact metrics for deployed ML models.

- Data and Concept Drift Detection : Build and automate data quality and validation checks to continuously monitor for data drift, concept drift, and data integrity issues that could degrade model performance.

- Explainability and Fairness Monitoring : Implement tools and techniques for model interpretability and model explainability (XAI), tracking feature importance, and monitoring for potential bias or fairness issues in production.

- Alerting and Triage : Establish clear, actionable alerting thresholds for model and infrastructure degradation, integrating with incident management workflows for quick triage and resolution.

B. Observability Platform and Infrastructure Telemetry Pipeline Development : Design, deploy, and manage robust Observability pipelines to collect, aggregate, and route the three pillars of observability (metrics, logs, and traces) from the ML platform and inference services.

- Dashboarding and Visualization : Create insightful and real-time dashboards (SLIs/SLOs) to provide a clear, unified view of the ML system's health, from infrastructure load to model prediction quality.

- Infrastructure-as-Code (IaC) for Observability : Use IaC tools to provision and manage the monitoring and logging infrastructure across cloud environments.

- Cost Optimization : Monitor telemetry data costs, implementing smart sampling and retention policies to ensure efficient use of observability tools.

C. Automation and CI/CD Automated Retraining Triggers : Integrate observability signals (like performance drop or data drift alerts) to automatically trigger the ML pipeline (CI/CD) for model retraining, testing, and redeployment.

- Reproducibility and Auditing : Ensure that model monitoring and all MLOps processes are fully reproducible, traceable, and adhere to governance and regulatory standards.

- Collaboration and Consultation : Work closely with Data Scientists and ML Engineers to instrument new models for observability from the ground up, educating them on best practices for monitoring and logging.

III. Technical Skills and Qualifications

A. Programming and Scripting Expert : Proficiency in Python is required, including libraries for data manipulation and ML (e.g., NumPy, Pandas, Scikit-learn).

- Strong Shell Scripting (Bash/Zsh) and experience with other languages like Go or Java is a plus.

B. MLOps Tools and Frameworks ML Frameworks : Familiarity with TensorFlow, PyTorch, or Scikit-learn to understand how models are built and served.

- MLOps Platforms/Tools : Hands-on experience with MLflow, Kubeflow, Data Version Control (DVC), or comparable solutions for experiment tracking and model registry.

- Orchestration : Experience with pipeline orchestration tools like Airflow, Kubeflow Pipelines, or Argo Workflows.

C. Cloud and Containerization Cloud Platforms : Deep working knowledge of one or more major cloud providers (AWS, GCP, or Azure) and their ML services (e.g., AWS SageMaker, Google AI Platform, Azure ML).

Containerization & Orchestration :

- Expertise with Docker and Kubernetes for deploying and managing production ML services.

- Infrastructure as Code (IaC) : Proficiency with Terraform or Ansible for infrastructure automation.

D. Monitoring and Observability Stack Metrics & Time-Series :

- Expertise with Prometheus and Grafana for collecting, querying, and visualizing time-series data.

- Logging & Tracing : Experience with centralized logging solutions (ELK Stack/Elasticsearch, Loki, Splunk) and distributed tracing tools (Jaeger, Zipkin, OpenTelemetry).

- Model Monitoring Tools : Experience with specialized model performance monitoring tools like Evidently AI, Seldon Core, or similar internal/commercial tools.

IV. Education and Experience :

Education : Bachelors or Masters degree in Computer Science, Software Engineering, Data Science, or a related technical field.

Experience : 3 years of experience in an MLOps, DevOps, SRE, or Observability-focused engineering role, with at least 1-2 years dedicated to production ML systems.

Soft Skills : Excellent problem-solving, analytical skills, and strong communication for collaborating effectively with cross-functional teams (Data Science, Software Engineering, Product).


  • MLOps Engineer

    7 days ago


    Hyderabad, Telangana, India Zorba Consulting Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Description : MLOps Engineer.Industry & Sector : Enterprise software and AI services delivering production-grade machine learning solutions, model lifecycle automation, and cloud-native data engineering for commercial clients. We build scalable, secure ML platforms that power analytics, recommendation, and prediction services.Location & Work Type :...

  • MLOps Engineer

    1 week ago


    Hyderabad, Telangana, India Maaze Underwriting Solutions Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Role & responsibilitiesDesign and implement MLOps pipelines using Kubeflow and KServe for model deployment and servingDeploy and manage production ML models on Kubernetes with NVIDIA GPU accelerationOptimize model inference using vLLM, TensorRT-LLM, Triton Inference Server, and PyTorch DynamoImplement auto-scaling using KEDA, HPA, and Azure Cluster...

  • MLOps Engineer

    1 week ago


    Hyderabad, Telangana, India Transgraph Consulting Full time ₹ 15,00,000 - ₹ 28,00,000 per year

    Seeking an MLOps Engineer to design, deploy, and monitor ML systems. You'll ensure models are reliable, scalable, and easy to manage, while building tools that support teams and improve workflows. Required Candidate profileLooking for 3+ yrs exp in DevOps/MLOps/ML/Data Eng, strong Python, Git, CI/CD, Docker, K8s, cloud (AWS/GCP/Azure).Plus MLflow, Kubeflow,...


  • Hyderabad, Telangana, India Algoleap Technologies Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    SUMMARY Role: Observability EngineerJob Description:Senior Platform EngineerWe are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. As a Senior Observability Engineer, you will be at the forefront of...


  • Hyderabad, Telangana, India Mitchell Martin Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role & responsibilitiesEssential DutiesInclude, but are not limited to, the following:Own productionizing modelsfrom tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollbacks.Build automation-first delivery: reproducible builds, layered tests, and environment promotion via GitLab CI and...

  • MLOps Engineer III

    2 weeks ago


    Hyderabad, Telangana, India Arroyo Consulting Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    OverviewSenior engineer with deep expertise in designing, automating, and scaling machine learning infrastructure. Provides mentorship to junior engineers and ensures operational excellence.ResponsibilitiesLead design of scalable MLOps frameworks and automation strategies.Optimize monitoring and alerting systems for drift, accuracy, and latency.Maintain...


  • Hyderabad, Telangana, India Mindlance Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Observability EngineerLocation:HyderabadJob Summary:We are seeking a highly skilled and motivatedGrafana Dashboard Specialistwith strong expertise in DevOps automation to join our team. The ideal candidate will be responsible for designing, developing, and maintaining advanced Grafana dashboards that provide actionable insights into system performance,...

  • MLOps Engineer II

    2 weeks ago


    Hyderabad, Telangana, India Arroyo Consulting Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    OverviewProficient MLOps engineer capable of independently managing production model deployments, pipelines, and infrastructure operations.Responsibilities:Deploy and maintain ML models in production using technologies like AWS SageMaker, MLflow, or Kubeflow.Manage pipelines and CI/CD workflows using tools like ArgoCD, Terraform, or similar...

  • MLOps Engineer

    5 days ago


    Hyderabad, Telangana, India Incedo Inc. Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Role: MLOps EngineerLocation: HyderabadExperience: 3 to 5 yearsJob details:ML pipeline creation, drift monitoring and controlAutomating CI/CD pipelines to account for data, code, and model changesDevelop and deploy CI/CD-based automated ML application pipelines (collection, processing, cleaning, transformation, etc.) along with the CT component for a...


  • Hyderabad, Telangana, India Xenon7 Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    About us:Where elite tech talent meets world-class opportunitiesAt Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources...