Observability Engineer

2 days ago


Hyderabad, Telangana, India Jobhedge Consultancy Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Description :


Job Description : AI-Driven Observability Engineer

Experience : 10 Years

About the Role :

We are seeking a highly skilled AI-Driven Observability Engineer to design, implement, and maintain end-to-end observability solutions for infrastructure and application. You will play a key role in ensuring the reliability, performance, and scalability of our distributed systems by developing monitoring, logging, and tracing capabilities. The ideal candidate will have expertise in ETL, Data Science, and Machine Learning, along with hands-on experience in OpenTelemetry, Splunk, Kafka for comprehensive observability.

Key Responsibilities :

- Design & Develop Observability Solutions: Build and enhance telemetry pipelines for logs, metrics, and traces using industry-standard tools (kafka, OpenTelemetry, Splunk)

- Instrument Applications: Implement observability best practices in infrastructure, applications and platforms.

- Design and Implement machine learning models to analyze logs, metrics and traces for anomaly detection, predictive failure analysis and root cause analysis.

- Monitor & Analyze System Performance: Build and Develop real-time data visualization dashboards and alerts to track system health, detect anomalies, and support real-time troubleshooting.

- Work with Event-Driven Architectures: Integrate observability with messaging systems like Kafka, RabbitMQ, or Pulsar for real-time monitoring.

- Collaborate Across Teams: Work closely with SREs, DevOps, and development teams to improve system reliability and incident response.

- Security & Compliance: Ensure observability data is securely stored and compliant with relevant regulations (GDPR, HIPAA, etc.).

- Optimize Performance: Conduct root cause analysis and improve system observability to reduce downtime and improve response times.

Required Skills & Experience :

- Data Science & Machine Learning experience: Hands-on proficiency in Python, TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy.

- Extensive knowledge of ETL techniques: Data extraction, transformation, and loading using Apache Airflow, Apache NiFi, Spark or similar tools

- Observability Stack: Hands-on experience with Prometheus, Grafana, ELK Stack, Loki, OpenTelemetry, Jaeger, or Zipkin.

- Experience with Time-Series Analysis, Predictive Analytics and AI-driven Observability.

- Cloud & Infrastructure: Experience with AWS, Azure, or GCP observability services (e.g., CloudWatch, Azure Monitor).

- Distributed Systems & Microservices: Understanding of Kubernetes, Docker, and Service Mesh technologies (Istio, Linkerd).

- Event-Driven Architectures: Experience with Kafka, RabbitMQ, or other message brokers.

- Database & Storage: Familiarity with time-series databases (InfluxDB, VictoriaMetrics) and NoSQL/SQL databases.

Preferred Qualifications :

- Experience in AIOps and intelligent observability or anomaly detection.

- Knowledge of Chaos Engineering for resilience testing.

- Certifications in AWS, Azure, Kubernetes, or Observability tools.

- Knowledge of data engineering and big data technologies like Hadoop, Spark and Flink.

- Experience with machine learning models for predictive observability.

Why Join Us ?

- Work on cutting-edge observability solutions in a high-scale production environment.

- Opportunity to automate infrastructure monitoring and enhance system resilience.

- Collaborate with cross-functional teams to improve reliability engineering.

- Competitive salary, benefits, and growth opportunities in a fast-paced environment.



  • Hyderabad, Telangana, India INTRAEDGE TECHNOLOGIES PRIVATE LIMITED Full time ₹ 6,00,000 - ₹ 12,00,000 per year

    The MLOps Observability Engineer will design, implement, and maintain the comprehensive monitoring, logging, and tracing solutions for our entire ML platform and production models. This includes building automated systems to detect model decay, data drift, and infrastructure performance issues, ensuring that our AI/ML applications are reliable, scalable, and...


  • Hyderabad, Telangana, India Guhatek Consulting Services Full time ₹ 1,50,000 - ₹ 28,00,000 per year

    Design and scale observability solutions (monitoring, logging, tracing), optimize alerting and incident response, automate with scripting, and collaborate with teams to ensure system reliability and performance.


  • Hyderabad, Telangana, India Providence Global Center Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Design, build, and maintain data ingestion and transformation pipelines that aggregate observability data from multiple systems (infrastructure, applications, APIs, and cloud services).Integrate diverse telemetry sources into enterprise observability platforms (e.g., Splunk, Datadog, Prometheus, Grafana, Elastic Stack, New Relic, or Dynatrace).Develop custom...


  • Hyderabad, Telangana, India Amgen Full time ₹ 40,00,000 - ₹ 1,20,00,000 per year

    Career CategoryInformation SystemsJob DescriptionJoin Amgen's Mission of Serving PatientsAt Amgen, if you feel like you're part of something bigger, it's because you are. Our shared mission—to serve patients living with serious illnesses—drives all that we do.Since 1980, we've helped pioneer the world of biotech in our fight against the world's toughest...


  • Hyderabad, Telangana, India Amgen Inc Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    *What you will do* In this vital role you will support the day-to-day operation and maintenance of global observability service.Administer, monitor, and support the monitoring and observability servicesMaintain and support the tools to enable global application and infrastructure monitoring.Monitor systems and proactively address performance...


  • Hyderabad, Telangana, India Jade Global Software Pvt Ltd Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    Senior Site Reliability Engineer (SRE) – Datadog ObservabilitySenior Site Reliability Engineer (SRE) – Datadog Observability1 Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad...


  • Hyderabad, Telangana, India Experian Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Company DescriptionExperian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, all using our unique combination of data, analytics and...


  • Hyderabad, Telangana, India Experian Full time ₹ 6,00,000 - ₹ 12,00,000 per year

    Company Description Experian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, all using our unique combination of data, analytics and...


  • Hyderabad, Telangana, India Jade Global Full time ₹ 1,00,00,000 - ₹ 2,00,00,000 per year

    Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...


  • Hyderabad, Telangana, India Jade Global Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    Senior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...