Cloud Operations Engineer for AI/ML Systems

7 days ago


Hyderabad, Telangana, India beBee Careers Full time

Job Description:

We are seeking an experienced Cloud Operations Engineer to join our team. This role will focus on designing and implementing event monitoring, correlation, and incident response systems for AI/ML applications.

Key Responsibilities:

  • Event Monitoring & Detection
    • Implement and maintain event monitoring for data pipelines, ML model workflows, real-time data streams, and AI services.
    • Set up real-time alerts for SLO/SLI breaches, data delays, model failures, prediction accuracy drops, and drift detection.
    • Integrate observability tools like Azure Monitor, Datadog, AppDynamics, ELK, Grafana to capture and visualize events.
  • Event Correlation & Analysis
    • Design and implement event correlation logic to reduce alert noise.
    • Leverage rule-based logic or ML-based anomaly detection to group related events from multiple sources.
    • Build intelligent dashboards providing event-driven insights into AI/ML system performance and health.
  • Event-Driven Incident Response
    • Automate and coordinate incident response workflows based on critical events.
    • Reduce MTTR using tools like ServiceNow Event Management and other automation workflows.
    • Lead Post Event Analysis (PEA) sessions for high-severity incidents and implement improvements.
  • Proactive Observability Engineering
    • Collaborate with ML/Data Engineering teams to embed custom telemetry for feature stores, jobs, streaming and batch pipelines.
    • Ensure full-stack observability via logs, traces, metrics, and alerting.
    • Continuously optimize alert thresholds, runbooks, and incident automation.
  • Cross-Team Communication & Operational Readiness
    • Act as the liaison for event, alert, and incident communication between platform engineering, data science, and business stakeholders.
    • Participate in Operational Readiness Reviews (ORRs) for new workloads.
    • Maintain up-to-date Standard Operating Procedures (SOPs) for triaging, classification, and escalation.
    ,

  • AI and ML Engineer

    16 hours ago


    Hyderabad, Telangana, India beBee Careers Full time

    Job Description:We are seeking a highly skilled AI and Machine Learning (ML) engineer to join our team. As an AI/ML engineer, you will design, develop, train, and evaluate Large Language Models (LLMs) and Generative AI models for various applications.You will implement and optimize Retrieval-Augmented Generation (RAG) pipelines to enhance the performance and...


  • Hyderabad, Telangana, India beBee Careers Full time

    About the RoleWe are looking for a highly experienced Product Architect to lead the development of cutting-edge AI applications. As a key member of our team, you will collaborate with AI researchers, product managers, and software architects to design and implement innovative AI-driven solutions.Responsibilities:Design and develop AI systems that leverage...

  • AI/ML Engineer

    4 weeks ago


    Hyderabad, Telangana, India Zen Technologies Ltd. Full time

    Job DescriptionJob DescriptionWe are seeking a highly skilled and motivated AI/ML Engineer to design, develop, and deploy state-of-the-art machine learning models and AI-driven systems. The ideal candidate will have a strong background in machine learning frameworks, programming, and data analysis, coupled with the ability to work in a collaborative,...


  • Hyderabad, Telangana, India beBee Careers Full time

    About the Role">We are seeking a skilled professional to lead the implementation of AI/ML automation solutions across our enterprise. The ideal candidate will have extensive experience in AI/ML operations, cloud-based platforms, and data analytics.Develop and implement AIOps strategies to automate IT operations and enhance system performance.Collaborate with...


  • Hyderabad, Telangana, India Blumetra Solutions Full time

    Job DescriptionJob SummaryWe are looking for a highly skilled AI Operations Engineer who specializes in Vector Databases and Prompt Engineering to join our AI and ML engineering team. The ideal candidate will be responsible for managing the deployment, optimization, and operational stability of AI/ML systems while ensuring integration and scalability of...


  • Hyderabad, Telangana, India beBee Careers Full time

    AI/ML Engineer Job DescriptionWe are seeking an experienced AI/ML Engineer to join our team. In this role, you will design, develop, and deploy artificial intelligence and machine learning models to drive business outcomes.Duties and Responsibilities:Design and develop AI/ML models using various techniques such as supervised and unsupervised learning,...

  • AI/ML Engineer

    13 hours ago


    Hyderabad, Telangana, India Liveconnections Full time

    Key Responsibilities :- Develop and deploy machine learning models using Python-based AI/ML frameworks (e.g., Keras, PyTorch, TensorFlow).- Leverage cloud services such as Amazon SageMaker, AWS EC2, AWS S3, and Azure Machine Learning for model training, deployment, and scaling.- Integrate machine learning workflows into CI/CD pipelines to enable automated...

  • AI/ML Engineer

    3 weeks ago


    Hyderabad, Telangana, India Tata Consultancy Services Full time

    TCS Hiring for AI/ML Engineer roleExperience : 3 to 8 yearsLocation : HyderabadExpectations from the role :• The team attempts to run a new unsupported model on VAIML.• In many cases, it does not execute successfully due to a lack of support for some operators or other technical reasons.• The team performs debugging to identify the issues.• The team...

  • Ai/ml engineer

    7 days ago


    Hyderabad, Telangana, India Tata Consultancy Services Full time

    TCS Hiring for AI/ML Engineer roleExperience : 3 to 8 yearsLocation : HyderabadExpectations from the role :• The team attempts to run a new unsupported model on VAIML.• In many cases, it does not execute successfully due to a lack of support for some operators or other technical reasons.• The team performs debugging to identify the issues.• The...


  • Hyderabad, Telangana, India beBee Careers Full time

    About the Role:We are seeking a highly skilled Cloud Architect and AI/ML Specialist to join our team. As a key member, you will play a critical role in designing and implementing innovative solutions that leverage cloud platforms and AI/ML services.Key Responsibilities:Develop and support an internal community of ML-related subject matter expertsDesign...