Staff Site Reliability Engineer
2 weeks ago
Staff Site Reliability EngineerSaviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the worlds leading brands, Fortune 500 companies and government institutions. For more information, please visit Our Monitoring and Alerting team within the SaaS Operations team combines Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing. The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience with building and managing Monitoring and Alerting systems. We are looking for a Systems Thinking, Principal Engineer who has helped teams scale through production insights, operational automation, building observability program, developer guidance, real-time metrics, automation, automation, automation WHAT YOU WILL BE DOING Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics. Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures. Design and implement strategies, tooling, and processes to enhance system uptime and reliability. Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability. Align the platform with customer needs and business goals by working closely with cross-functional teams. Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to monitor platform infrastructure and applications. Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement. Provide primary operational support and engineering for multiple large-scale distributed software applications. Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. WHAT YOU BRING Bachelors degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, Masters degree a plus 6+ years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles. 4+ experience in Cloud development (AWS, Azure) and observability skills; Experience with building and operating highly resilient platforms in AWS cloud environments. 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation Hands-on experience with container orchestration, preferably with Kubernetes Hands-on experience with building observability, monitoring and alerting on large scale distributed systems. Leadership/design of application and/or infrastructure migration projects from on-prem to cloud Cloud architecture design and implementation to solve key business needs and meet team goals. Familiarity with current AWS solutions; Azure experience also considered. Containerized workloads (Prefer Helm; Related: AKS & EKS, other K8s distributions, Docker, JFrog) Logging and monitoring tools (Prefer: Prometheus, Grafana, Dataddon, AWS Cloudwatch; Related, , Azure Monitor, Log Analytics, Fluentd) Network Security (e.g. AWZ Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints) Proven experience in implementing advanced observability practices and techniques at scale. Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.) Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale. Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, etc) Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale. Ability to build monitoring ecosystem with high fidelity alerting. Ability to automate resolution of alerts. Ability to automate with various scripting languages (Python, Golang, Shell scripting,etc.) Knowledge of managing systems using infrastructure as code tools (IAM, ARM,Terraform, Chef) Solid understanding of Cloud Computing and DevOps concepts. Hands-on Kubernetes skills and knowledge. Proven experience in maintaining scalability and resiliency of complex environment. Ability to triage, execute root cause analysis, and be decisive under pressure Experience managing and interpreting large datasets using query languages and visualization tools Proficient communication skills with an ability to reach both technical and non-technical audience Ability to learn new software, method and practices and bringing them to our developers Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships PI a96aa
-
Staff Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India Okta Full time ₹ 8,00,000 - ₹ 24,00,000 per yearJoin our team Were building a world where Identity belongs to you.Oktas Workforce Identity Cloud Security Engineering group is looking for a Staff Site Reliability Engineer with a passion for DevSecOps , Infrastructure Security , and SRE . Join a team that is not just building solutions but redefining the standards for cloud security. If you have a proven...
-
Sr. Staff Site Reliability Engineer
2 weeks ago
Bengaluru, India SolarWinds Full timeAt SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, Partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions. The ideal candidate thrives in an innovative,...
-
Staff Site Reliability Engineer
2 weeks ago
Bengaluru East, Karnataka, India Visa Full time ₹ 12,00,000 - ₹ 36,00,000 per yearVisa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network,...
-
Bengaluru, India Movius Full timeAbout the Role :We are looking for a highly experienced Senior Staff Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will bring deep technical expertise in DevOps, automation, and large-scale distributed systems, with a strong understanding of cloud operations and CI/CD frameworks. Experience in the telecom domain will be an...
-
Senior Site Reliability Engineer
2 weeks ago
Bengaluru, India SolarWinds Full timeYour Role :We are seeking a Sr. Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with experience in AWS, AZURE, Kubernetes and GitOps to work with our Site Reliability Engineering (SRE) team. The successful candidate will understand SRE practices and have a track record of implementing high-quality site reliability engineering...
-
Site Reliability Engineering
1 week ago
Bengaluru, Karnataka, India Thakral One Full time US$ 60,000 - US$ 1,20,000 per yearCompany DescriptionThakral One, headquartered in Singapore, is a technology consulting and services company with a strong presence across Asia. The company specializes in technology-driven consulting, custom solution development, data analytics, and leveraging cloud capabilities to deliver enhanced decision support and practical outcomes. Collaborating...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, India Whatjobs IN C2 Full timeSite Reliability Engineer (SRE) Level 3 Overview: A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and highly reliable systems. This role emphasizes a blend of software and systems engineering to ensure the availability, latency, performance, and capacity...
-
Site Reliability Engineering
7 days ago
Bengaluru, Karnataka, India Viraaj HR Solutions Private Limited Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer (SRE)About The OpportunityA fast-growing organization in the Enterprise Cloud Infrastructure & SaaS sector delivering highly available, mission-critical services to enterprise customers. We are hiring an on-site Site Reliability Engineer in India to own reliability, automation, and operational excellence across cloud-native...
-
Site Reliability Engineer
3 weeks ago
Bengaluru, India ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Site Reliability Engineer
3 weeks ago
Bengaluru, India ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...