Observability Engineer

4 weeks ago


Gurugram Hyderabad Pune, India Whisk Software Private Limited Full time
Job Description

Job Title: Observability Engineer

Location: Hyderabad, Pune, Gurgaon

Experience: 8 - 14 Years

Notice Period : Immediate to 15 Days

Skills: ITIL, ITSM, Sumo Logic, APM Tools

Role and Responsibilities:

- Develop and implement comprehensive observability strategies to monitor IT systems and applications.
- Utilize ITSM/ITIL frameworks to align observability practices with organizational processes and standards.
- Deploy and manage APM tools to monitor the performance and health of applications.
- Analyse APM data to identify performance bottlenecks and optimize application performance.
- Configure and maintain Sumo Logic for log management and analytics.
- Use Sumo Logic to collect, process, and analyse logs, metrics, and traces to gain insights into system behaviour.
- Collaborate with IT teams to identify, investigate, and resolve incidents and problems.
- Conduct Root Cause Analysis (RCA) to prevent recurrence of issues and improve system reliability.
- Analyse data to detect anomalies, trends, and potential issues before they impact users.
- Create and maintain dashboards to visualize key performance indicators (KPIs) and system health metrics.
- Generate reports to communicate findings and recommendations to stakeholders.
- Implement best practices from ITIL/ITSM to enhance the efficiency and effectiveness of observability processes.
- Work closely with development, operations, and security teams to ensure comprehensive observability coverage

Required Skills:

- 8+ Years of experience, having thorough knowledge on ITIL/ITSM process.
- Should have good exposure to APM tools existing in market like Dynatrace, Datadog
- Should have expertise in sumo logic.
- Design and implement alerting mechanism leveraging tools for monitoring, alerting and logging to detect potential issues.
- Reduce noise on alerts.
- Periodic reviews to increase observability coverage across applications.
- Periodic review of metrics and derive anomalies.
- Periodic updates to dashboards as required

  • Pune, Maharashtra, India Growel Softech Pvt. Ltd. Full time

    Observability Tools : - Experience with open-source observability tools such as Grafana, Prometheus, Mimir, Loki, FluentD, OpenTelemetry, and Tempo.- Experience designing, implementing, and managing observability platforms to monitor the performance and reliability of distributed Observability : - Exposure to AI/ML-based observability tools and techniques,...


  • Pune, Maharashtra, India Sarvaha Systems Full time

    Sarvaha would like to welcome a skilled Observability Engineer with a minimum of 3 years of experience to contribute to designing, deploying, and scaling our monitoring and logging infrastructure on Kubernetes .In this role, you will play a key part in enabling end-to-end visibility across cloud environments by processing Petabyte data scales, helping...


  • Pune, Maharashtra, India Sarvaha Systems Full time

    Sarvaha would like to welcome a skilled Observability Engineer with a minimum of 3 years of experience to contribute to designing, deploying, and scaling our monitoring and logging infrastructure on Kubernetes . In this role, you will play a key part in enabling end-to-end visibility across cloud environments by processing Petabyte data scales, helping...


  • Hyderabad, Telangana, India beBeeObservability Full time ₹ 1,04,000 - ₹ 1,30,878

    ">Job Title: Observability Engineering Specialist">Job Summary: We are seeking an Observability Engineering Specialist to join our team. The ideal candidate will have a strong background in DevOps, Observability, and related technologies.">Key Responsibilities:">">Design, implement, and maintain observability solutions for distributed systems.">Develop...


  • Hyderabad, Telangana, India beBeeObservability Full time ₹ 20,00,000 - ₹ 24,00,000

    Job OpportunityWe are seeking a highly skilled Observability Engineer with expertise in designing and implementing scalable observability platforms, driving adoption of APM tooling, and embedding synthetic monitoring into modern service architectures.The ideal candidate will have hands-on experience in building secure and resilient synthetic monitoring...


  • Pune, Hyderabad / Secunderabad, Telangana, Gurgaon / Gurugram, India beBeeObservability Full time ₹ 15,00,000 - ₹ 20,00,000

    Job Title: Observability EngineerWe are seeking a seasoned Observability professional to join our team. The ideal candidate will have a strong background in ITIL/ITSM and extensive experience with APM tools.Key Responsibilities:Develop and implement comprehensive observability strategies to monitor IT systems and applications.Utilize ITSM/ITIL frameworks to...


  • Hyderabad, Telangana, India beBeeObservability Full time US$ 1,50,000 - US$ 2,00,000

    Site Reliability Engineer - Observability ExpertWe are seeking a highly skilled Site Reliability Engineer to join our team. As an Observability expert, you will design and develop next-generation observability platforms that enable our clients to monitor and improve their complex IT systems.The ideal candidate will have a strong background in software...


  • Hyderabad / Secunderabad, Telangana, India beBeeObservability Full time ₹ 1,04,000 - ₹ 1,30,878

    Job Title: Sr Observability Engineer">We are seeking an experienced Observability Engineer to join our team. This is a key role in the development and implementation of our observability strategy.Key Responsibilities:Design, implement and maintain observability systems for monitoring and notification technologies.Collaborate with cross-functional teams to...

  • Observability/AlOps

    4 days ago


    Hyderabad, Telangana, India IntraEdge Full time

    L2- Observability/AIOps (5 to 8 yrs exp).Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast...

  • Observability/AlOps

    2 days ago


    Hyderabad, Telangana, India IntraEdge Full time

    L2- Observability/AIOps (5 to 8 yrs exp). Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast...