SRE Observability Engineer

23 hours ago


Hyderabad, India TerraGiG Full time

We are looking for SRE Observability Engineer About the Role: Duration: Permanent Location: Hyderabad Timings: Full Time (As per company timings) Notice Period: (Immediate Joiner - Only) Experience: 6-10 Years JD: Position: SRE Observability Engineer Exp: 5+ to 10 Years Location: Hyderabad Mandatory Skills: Observability, Grafana and Writing queries using Prometheus and Loki. Job Description: We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. As a Senior Observability Engineer, you will be at the forefront of implementing cutting-edge monitoring, logging, and tracing solutions that ensure the reliability, performance, and availability of our complex, distributed systems. You will be collaborating with cross-functional teams, including Development, Infrastructure Engineers, DevOps, and SREs, to optimize system observability, and improve our incident response capabilities. Key Responsibilities: - Lead the Design & Implementation of observability solutions, including monitoring, logging, and tracing for both cloud and on-premises environments. - Drive the Development and maintenance of advanced monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics. - Implement Distributed Tracing frameworks like OpenTelemetry, Jaeger, or Zipkin, and enhance application performance diagnostics and troubleshooting. - Optimize Log Management and analysis strategies using tools like Elasticsearch, Splunk, Loki, and Fluentd, ensuring efficient log processing and insights. - Develop Advanced Alerting and anomaly detection strategies to proactively identify system issues, minimizing downtime and improving Mean Time to Recovery (MTTR). - Collaborate with Development & SRE Teams to enhance observability in CI/CD pipelines, microservices architectures, and across various platform environments. - Automate Observability Tasks by leveraging scripting languages such as Python, Bash, or Golang to increase efficiency and scale observability operations. - Ensure Scalability & Efficiency of monitoring solutions to manage large-scale distributed systems and handle evolving business requirements. - Lead Incident Response by providing actionable insights through observability data for effective troubleshooting and root cause analysis. - Stay Abreast of Industry Trends in observability, Site Reliability Engineering (SRE), and monitoring practices, continuously improving processes. Required Qualifications: - 5+ years of hands-on experience in observability, SRE, DevOps, or a related field, with a proven track record of successfully managing complex, large-scale distributed systems. - Expert-level proficiency in observability tools such as Prometheus, Grafana, Datadog, New Relic, AppDynamics, with the ability to lead the design and implementation of these solutions at scale. - Advanced experience with log management platforms like Elasticsearch, Splunk, Loki, and Fluentd, and the ability to optimize log aggregation and analysis for better performance insights. - Deep expertise in distributed tracing tools such as OpenTelemetry, Jaeger, or Zipkin, with a focus on performance optimization and root cause analysis. - Extensive experience with cloud environments (preferably Azure, AWS, GCP) and Kubernetes for deploying and managing observability solutions across modern, cloud-native infrastructures. - Advanced proficiency in scripting languages such as Python, Bash, or Golang, and strong experience with Infrastructure as Code (IaC) tools like Terraform and Ansible. - Strong understanding of system architecture, performance tuning, and troubleshooting complex production environments, with an emphasis on scalability and high availability. - Proven experience in leading and mentoring teams, providing technical direction, and driving the adoption of best practices for observability and monitoring. - Exceptional problem-solving skills, with a focus on providing actionable insights and data-driven decision-making. - Ability to lead high-impact projects, effectively communicate with stakeholders, and influence cross-functional teams. - Strong communication and collaboration skills; demonstrated ability to work closely with engineering teams, leadership, and external partners to meet observability and system reliability goals. Preferred Qualifications: - Experience with AI-driven observability tools and anomaly detection techniques. - Familiarity with microservices, serverless architectures, and event-driven systems. - Proven track record of handling on-call rotations and incident management workflows in high-availability environments. - Relevant certifications in observability tools, cloud platforms, or SRE best practices are a plus. Interested candidates please share your resume to balkis.begam@terragig.in



  • Hyderabad, India Awign Expert Full time

    Job Description Position: SRE Observability Engineer Exp: 5+ to 10 Years Location: Hyderabad Mandatory Skills: Observability, Grafana and Writing queries using Prometheus and Loki. Job Description: We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across...


  • Hyderabad, India Awign Expert Full time

    Position: SRE Observability Engineer Exp: 5+ to 10 Years Location: Hyderabad Mandatory Skills: Observability, Grafana and Writing queries using Prometheus and Loki. Job Description: We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our...


  • Hyderabad, India TerraGiG Full time

    We are looking for SRE Observability EngineerAbout the Role:Duration: PermanentLocation: Hyderabad Timings: Full Time (As per company timings)Notice Period: (Immediate Joiner - Only)Experience: 6-10 YearsJD:Position: SRE Observability EngineerExp: 5+ to 10 YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus...


  • hyderabad, India TerraGiG Full time

    We are looking for SRE Observability EngineerAbout the Role:Duration: PermanentLocation: Hyderabad Timings: Full Time (As per company timings)Notice Period: (Immediate Joiner - Only)Experience: 6-10 YearsJD:Position: SRE Observability EngineerExp: 5+ to 10 YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus...


  • Hyderabad, India Evnek Technologies Pvt Ltd Full time

    Job Description Job Title: SRE Observability Engineer Experience: 6 Years Location: Hyderabad Notice Period: Immediate Joiners Only About the Role We are seeking a highly skilled and motivated SRE Observability Engineer to design, build, and scale observability platforms across our distributed systems. The ideal candidate will have deep expertise in...


  • Hyderabad, India Evnek Technologies Pvt Ltd Full time

    Job Title: SRE Observability Engineer Experience: 6 Years Location: Hyderabad Notice Period: Immediate Joiners Only About the Role We are seeking a highly skilled and motivated SRE Observability Engineer to design, build, and scale observability platforms across our distributed systems. The ideal candidate will have deep expertise in monitoring, logging,...


  • Hyderabad, Telangana, India TerraGiG Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    We are looking forSRE Observability EngineerAbout the Role:Duration: PermanentLocation: HyderabadTimings: Full Time (As per company timings)Notice Period: (Immediate Joiner - Only)Experience: 6-10 YearsJD:Position: SRE Observability EngineerExp: 5+ to 10 YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus and...

  • Observability SRE

    1 week ago


    Hyderabad, India Ifintalent Global Private Limited Full time

    Job Description Key Responsibilities: - Design, build, and maintain observability platforms including monitoring, logging, tracing, and alerting systems. - Implement and optimize metrics collection using tools like Prometheus, Grafana, OpenTelemetry, or similar. - Develop and maintain centralized logging infrastructure (e.g., Data Dog, Open Telemetry,...


  • Hyderabad, Telangana, India algoleap Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role: Observability EngineerJob Description:Senior Platform EngineerWe are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. As a Senior Observability Engineer, you will be at the forefront of implementing...

  • Observability Lead

    3 days ago


    Hyderabad, Telangana, India Micron Technology Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Lead Observability Strategy: Define and execute the observability roadmap aligned with business and IT goals, integrating AIOps and SRE principles. Tool Ownership & Integration: Manage and optimize observability tools including OpsRamp, Splunk, AppDynamics, NetBrain, ThousandEyes, and explore new platforms like BigPanda and ServiceNow AIOps. Automation...