
SRE Observability Architect
3 weeks ago
• Minimum 10 years of relevant work experience with monitoring setup using any product (Dynatrace, Datadog, ELK stack, Splunk, Grafana/Prometheus, etc.) set up in critical production environments.
• Minimum 5-6 years of work experience in end-to-end observability covering technical, user experience and business outcome metrics. Experience with AIOps is an advantage.
• Has experience working with private cloud and Cloud-native public-cloud (particularly AWS) hosted applications.
• Multi-tenancy setup and data segregation on the observability and AIOps stack.
• Designing and building an Observability & Maintenance (O&M) module for multi-tenant solutions.
• Defining SLIs and setting up SLOs for multi-tenant solutions.
Core Capabilities:
• Experience in implementing Container, Network, APM, RUM, Log Analytics, end-to-end tracing, and custom alerts with Grafana, Prometheus, Grafana Loki (alternatively Logstash or Fluent bit). Implementing the same on any other 3rd party product like Dynatrace is also considered.
• Proficiency with containers and multi-tenancy setup for the observability solution is critical.
• Ability to configure custom alerts, monitors and build AIOps workflows based on telemetry.
• Good understanding of setting up integration capabilities with other systems via APIs and consuming external APIs for IAM as well as ingesting metric-based telemetry via collectors.
• Ability to build custom observability dashboards across different portfolios and personas.
• Setting up Synthetic Monitoring and Test Automation while integrating its telemetry into the observability stack.
• Tenant and data segregation as well as ability to obfuscate sensitive information on the common observability schema.
• Ability to code is preferable – Python / Java and Ansible scripting preferred.
Qualification:
• Observability Foundation certification from DevOps Institute or any product-level accreditation.
• Any recognized System Architecture qualifications ( TOGAF) are a bonus.
Role & Responsibilities:
• Architect, design and ensure Implementation of the entire observability solution to be packaged as a module in a multi-tenant private cloud solution.
• Implement observability solution to monitor and apply the same feature-set across all tenants (monitor and act upon telemetry from tenants – serving as a hypervisor).
• Design and implement integrations as well as externalize APIs.
• Set up authentication and authorization controls by integrating with an IAM layer.
• Work with UI/UX teams to design dashboards for the Observability & Maintenance platform for both the tenants as well as the host.
• Design and set up an AIOps module responsible for automated remediation workflows such as capacity scaling, container restarts, anomaly detection, etc.
• Work on building Proof-of-Concept solutions to view end-to-end tube-maps / service flows for the respective tenant’s services.
• Defining and setting up a CMDB to serve as a source for the infrastructure and application telemetry.
• Work with other teams to ensure the system is well-tested and scalable, meeting tenant demands.
• Define business aligned SLIs and set SLOs for core services and journeys. Primary Location Bangalore, Karnataka, India Job Type Experienced Years of Experience 12 Travel No
-
SRE – Cloud Security
4 weeks ago
Bengaluru, India Xebia Full timeSRE – Cloud Security & ObservabilityLocation: Bangalore (Hybrid – 3 days office per week)We are looking for a Cloud Site Reliability Engineer (SRE) with strong expertise in Cloud Security and Observability to design, build, and scale resilient cloud platforms.ResponsibilitiesArchitect and optimize Terraform modules for multi-environment...
-
Observability Platform and SRE Engineer
1 week ago
Bengaluru, Karnataka, India Kotak Mahindra Bank Full time ₹ 8,00,000 - ₹ 20,00,000 per yearDev Ops Engineering III-SUPPORT SERVICES-Applications-CTB Title : Observability Platforms and SRE Engg. The Company : World of Kotak product suite encompasses a powerful suite of cross banking assets, all-in-one stop banking services, securities, and investment banking; insights across a wide spectrum of the major financial and banking markets. ...
-
Sre Architect
4 days ago
Bengaluru, India CIEL HR Services Full time**JD**: Strong understanding and knowledge on SRE setup on GCP development environment. Understanding on monitor performance, resource utilization, and error logs when products move into production. Experience on SRE tool implementation (incident and configuration management tools) from scratch. Good grip on the foundational concepts of SRE (observability...
-
SRE – Cloud Security and Observability
5 days ago
Bengaluru, Karnataka, India RapidCircle Advisory Full time ₹ 12,00,000 - ₹ 36,00,000 per yearMaking a difference and driving positive change is what we do every day at Rapid Circle. Our Cloud Pioneers help our clients in their digital transformation. Are you someone who goes for constant, positive change? Then this vacancy is for youAs a Cloud Pioneer at Rapid Circle, you will work with our customers on different projects. For example, making impact...
-
Bengaluru, Karnataka, India Populace World Solutions Full time ₹ 20,00,000 - ₹ 25,00,000 per yearPosition- Java/Python/GO/Terraform -with SRE Observability DEVELOPERExperience- 5+ yearsLocation- BangaloreRequired Skills & Qualifications:Solid Development experience of at least 5 years is a must.• Required Technical skills: 6+yrs (Terraform Primary skill& Automation is Primary role)• Development Experience in any one of the programming languages:...
-
SRE, Observability System Administrator
3 weeks ago
Bengaluru, India Toast Full timeThe Observability System Administrator role at Toast fits within the Observability Enablement & Administration team, which is part of Site Reliability Engineering, responsible for overseeing Toast production services, with a commitment to quality, reliability, and low latency. The Observability Enablement & Administration team is responsible for setting the...
-
SRE, Observability System Administrator
2 weeks ago
Bengaluru, India Toast Full timeThe Observability System Administrator role at Toast fits within the Observability Enablement & Administration team, which is part of Site Reliability Engineering, responsible for overseeing Toast production services, with a commitment to quality, reliability, and low latency. The Observability Enablement & Administration team is responsible for setting the...
-
SRE, Observability System Administrator
1 week ago
Bengaluru, India Toast Full timeJob Description The Observability System Administrator role at Toast fits within the Observability Enablement & Administration team, which is part of Site Reliability Engineering, responsible for overseeing Toast production services, with a commitment to quality, reliability, and low latency. The Observability Enablement & Administration team is responsible...
-
SRE L3
5 days ago
Bengaluru, Karnataka, India Wipro Full time ₹ 15,00,000 - ₹ 25,00,000 per yearMandatory Skills:- SRE Ops with Devops & Observability/Automatio LocationBnagalore Preferred (OK with ) LevelL3- About The Role :Must to haveSRE Ops, AWS Cloud Infra, DevOps, Linux, Observability/Automation, CI/CD,Kubernetes/Docker- Good to haveTools extensive knowledge (likeAppDynamics, Nagios, Splunk, Dynatrace, New Relic, Prometheus, Grafana, ELK, etc.),...
-
Lead Observability Engineer
19 hours ago
Bengaluru, India InvestCloud, Inc. Full timeJob Description Key Responsibilities - Own the design, deployment, and lifecycle management of the Splunk Enterprise platform, including indexer and search head clustering, forwarders, and knowledge objects. - Define and implement best practices for data onboarding, parsing, enrichment, and storage to support observability use cases. - Collaborate with...