Lead - Site Reliability Engineer

7 days ago


Hyderabad, India VXI Global Solutions Full time

We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus , Grafana , Google Cloud Monitoring , and OpenTelemetry , along with exposure to SolarWinds . You should be comfortable working with metrics, logs, and traces , and be able to correlate telemetry data to proactively detect, diagnose, and resolve performance issues.

Key Responsibilities:

  • Design and maintain observability pipelines using OpenTelemetry, Prometheus, and Grafana.
  • Build dashboards and alerts to monitor system health, application performance, and business KPIs.
  • Integrate observability solutions with Google Cloud Platform services and SolarWinds.
  • Correlate logs, metrics, and traces to troubleshoot incidents and reduce MTTR.
  • Collaborate with SREs, DevOps, and development teams to improve end-to-end system observability.
  • Implement best practices for telemetry data collection, enrichment, storage, and visualization.

Requirements:

  • Strong experience with Prometheus and Grafana for monitoring and alerting.
  • Proficiency in OpenTelemetry for instrumenting distributed systems.
  • Working knowledge of observability tools in Google Cloud (e.g., Cloud Monitoring, Logging, Trace).
  • Exposure to SolarWinds for network and infrastructure monitoring.
  • Solid understanding of telemetry data types: metrics, logs, and traces.
  • Ability to correlate and analyze multi-source observability data.
  • Scripting skills (Python, Bash) and familiarity with Infrastructure-as-Code is a plus.

Preferred Qualifications:

  • Experience in Site Reliability Engineering or Platform Engineering roles.
  • Knowledge of SLIs/SLOs and performance benchmarking.
  • Experience with APM tools (e.g., Datadog, New Relic) is a plus.


  • Hyderabad, Telangana, India JP Morgan Chase & Co. Full time

    Job DescriptionAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking Team, you will take the lead in conducting resiliency design reviews, break...


  • Hyderabad, India JP Morgan Chase & Co. Full time

    Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, youhold a leadership role in your team, demonstrate strong knowledge...


  • Hyderabad, India Chase Bank Full time

    Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, youhold a leadership role in your team, demonstrate strong knowledge...


  • Hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus , Grafana , Google Cloud Monitoring , and Open Telemetry , along with exposure to ...


  • Hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...


  • Hyderabad, Telangana, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...


  • hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...


  • Hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...


  • Hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...


  • Hyderabad, India VXI Global Solutions Full time

    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience withPrometheus,Grafana,Google Cloud Monitoring, andOpenTelemetry, along with exposure toSolarWinds. You should...