Senior Site Reliability Engineer – Grafana

3 days ago


New Delhi, India Aptimized Full time

Job Description – Senior Site Reliability Engineer (SRE) – Grafana & ObservabilityPosition: Senior Site Reliability Engineer – Grafana & ObservabilityLocation: [Hyderabad /Hybrid]Experience: 10–20+ yearsOperating globally, Aptimized is a premium ERP, HCM, and Technology Optimization Consulting agency. Our team at Aptimized focuses on helping our customers become intelligent enterprises through leveraging creative technology solutions. At Aptimized, we prioritize our clients’ needs and create tailor-made solutions to deliver success. We understand success is not achieved through chance. We listen to your concerns. We consult with your organization. We accelerate your business. Visit us at our website to learn more about what we can do for youWe are looking for a highly skilled Senior Site Reliability Engineer (SRE) with deep hands-on experience in Grafana ecosystem, observability engineering, and large-scale monitoring platforms.The ideal candidate will be an expert in building and managing Grafana dashboards, Managed Grafana, Prometheus monitoring, OpenTelemetry pipelines, and integrating multiple data sources across cloud and on-prem infrastructures.This role focuses heavily on building real-time observability, improving system reliability, and enabling data-driven operational insights.Key ResponsibilitiesGrafana Engineering & Dashboard DevelopmentBuild advanced Grafana dashboards with alerts, custom panels, JSON models, and data visualizations.Work with Grafana Managed (Azure Managed Grafana / AWS Managed Grafana) for enterprise-grade observability.Integrate Grafana with multiple data sources such as:PrometheusELK / ElasticsearchDynatraceCloudWatchAzure MonitorInfluxDB / TelegrafServiceNow (incident integrations)Develop role-based access (RBAC) and multi-tenant dashboard architectures.Promztheus, Metrics & AlertingArchitect and manage Prometheus metrics pipelines, exporters, recording/alerting rules.Optimize PromQL queries for high-performance dashboards.Reduce alert noise through intelligent rule tuning and SLO-driven alerts.Observability Platform OwnershipBuild and maintain end-to-end observability stack:Grafana + Prometheus + ELK + OpenTelemetry + Cloud-native monitoring tools.Integrate logs, metrics, traces into unified dashboards.Establish SLIs, SLOs, error budgets, and real-time reliability insights.Kubernetes & Cloud MonitoringDeploy and monitor Kubernetes clusters (AKS, EKS, Rancher).Configure Grafana Alloy / Prometheus Operator / kube-state-metrics for cluster-level insights.Implement Infrastructure-as-Code for observability stack deployments.Automation & Infrastructure as CodeAutomate monitoring agent deployments using:TerraformAzure DevOps / GitHub / GitLabFluxCD, Kustomize, HelmDevelop monitoring-as-code for repeatable environment provisioning.Incident Response & Performance TroubleshootingProvide deep troubleshooting across infrastructure, network, applications, and microservices.Build automated dashboards for war rooms and cross-team collaboration.Leverage Grafana annotations, synthetic monitoring, and event correlation.Security, Compliance & GovernanceImplement secure access to metric/log dashboards using IAM, RBAC, ABAC.Configure audit logs, long-term retention, and secure storage pipelines.(Optional: FedRAMP/NIST experience beneficial for regulated workloads.)Required Skills & ExpertiseGrafana & Observability (Primary)Expert in Grafana dashboard engineeringPrometheus + AlertmanagerManaged Grafana (Azure/AWS)ELK Stack (Elasticsearch, Logstash, Kibana)OpenTelemetry (OTEL) metrics & tracesGrafana Alloy, Loki (Bonus)Cloud PlatformsAzure, AWS, IBM Cloud (Nice-to-have)CloudWatch, Azure Monitor, App InsightsContainers & InfrastructureKubernetes (AKS, EKS)Docker, Rancher, OpenShiftLinux (RHEL/CentOS)DevOps & AutomationTerraform, Helm, KustomizeGit, CI/CD pipelinesScripting (Python, Bash, PowerShell)Monitoring EcosystemExperience with additional tools is a plus:DynatraceSplunkSysdigAppDynamicsSolarWindsMoogsoft AI-OpsPreferred QualificationsStrong background in SRE, Observability Engineering, DevOps, or Platform Engineering.Experience with microservices, distributed systems, and cloud-native architectures.ITIL v3 or industry certifications in AWS/Azure/Kubernetes are a plus.EducationBachelor’s degree in Computer Science, Engineering, or equivalent experience.Certifications in cloud, observability, Grafana, or Kubernetes are an advantage.



  • New Delhi, India Movius Full time

    Senior Staff Site Reliability EngineerLocation: Bengaluru, KA, 560076 Job Description: We are seeking a highly skilled Senior Staff Site Reliability Engineer with extensive experience in DevOps/SRE roles and large-scale distributed systems. The ideal candidate will have a proven background in cloud operations, automation, and CI/CD, with a preference for...


  • New Delhi, India Movius Full time

    Senior Staff Site Reliability EngineerLocation: Bengaluru, KA, 560076Job Description:We are seeking a highly skilled Senior Staff Site Reliability Engineer with extensive experience in DevOps/SRE roles and large-scale distributed systems. The ideal candidate will have a proven background in cloud operations, automation, and CI/CD, with a preference for...


  • New Delhi, India Movius Full time

    Senior Staff Site Reliability EngineerLocation: Bengaluru, KA, 560076Job Description:We are seeking a highly skilled Senior Staff Site Reliability Engineer with extensive experience in DevOps/SRE roles and large-scale distributed systems. The ideal candidate will have a proven background in cloud operations, automation, and CI/CD, with a preference for...


  • New Delhi, India VXI Global Solutions Full time

    We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience withPrometheus ,Grafana , along with exposure toSolarWinds . You should be comfortable working withmetrics, logs, and...


  • New Delhi, India Datum Technologies Group Full time

    Job Title: Site Reliability Engineer (SRE) – Azure & AIExperience: 7+ yearsWork Mode: HybridWork Location: Chennai/Mumbai/GurgaonJob Summary:We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AI infrastructure, and automation. The ideal candidate will have a solid background in managing cloud...


  • New Delhi, India Synechron Full time

    We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5+ years. Synechron– MumbaiJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -MumbaiAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+ people, across 58 offices, in 21...


  • New Delhi, India Grootan Technologies Full time

    About the RoleWe are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...


  • New Delhi, India Tata Consultancy Services Full time

    TCS Hiring For Site reliability engineer/application support engineer location: Delhi NCRExperience: 4-10JDRequired Skills :Splunk tool Application support Grafana Devops Kubernetes Monitoring tool Site reliability


  • Delhi, NCR, New Delhi, Pune, India Ithena Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Senior Site Reliability Engineer (SRE) Backend SystemsLocation: Remote (India) Pune/Delhi/Delhi NCRMumbaiExperience: 8+ years Were looking for a Senior SRE to join our backend team and help scale our real-time, event-driven platform. This role goes beyond traditional DevOps we're seeking engineers who can write high-quality code, debug complex distributed...


  • Delhi, India Synechron Full time

    We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5+ years.Synechron– MumbaiJob Role: -SRE (Senior Site Reliability Engineer)Job Location: -MumbaiAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+ people, across 58 offices, in 21 countries,...