Site Reliability Engineer/Architect

2 months ago


BangaloreAny Location, India Grizmo Labs Full time

Responsibilities :

- Own the Infrastructure, and APM and work with Developers and Systems engineers to Build, Release, Monitor, and run the reliability of the service exceeding the agreed SLAs.

- Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go, and Python.

- Write automation to reduce toil and eliminate manual, repeatable tasks.

- Work with Ansible, Puppet, Chef, Terraform, or another config management/orchestration suite, know where it's broken, work toward fixing them, and explore new alternatives.

- Define and accelerate the implementation of support processes, tools, and best practices Maintain services once they are live by measuring and monitoring availability, latency, and overall system reliability.

- Handle cross-team performance issues from identification of the cause, to determining the areas of improvement and driving those actions to closure.

- Performance and maturity baselining of Systems, tools maturity, coverage, metrics, technology, and engineering practices.

- Define, Measure, and Improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline - automate release management.

- Build dashboards to provide visibility into the performance of the applications.

- Create chaos in the production environment purposefully in a controlled manager to validate the reliability of systems.

- Mentor and coach other SREs in the organization.

- Provide written and verbal updates to executives and the stakeholders of the application in the organization.

- Understand the current process, and system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.

- Troubleshoot, debug and diagnose operational issues and drive them to closure.

- Understanding of software delivery life cycles, particularly Agile/Lean, and DevOps.

Requirements :

- A strong believer in automation to bring in sustained continuous improvement by automating Toil, and Runbooks, improving the ability of the applications to auto-heal leading to improved reliability.

- 15+ years of experience in the Development and Operations of applications/services in production that have uptime over 99.9%.

- 8+ years of experience as a SRE in handling web-scale applications.

- Strong hands-on coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc.

- Good understanding of Observability (monitoring, logging, tracing, metrics) and chaos engineering concepts.

- Proficiency in using Observability tools (for example : New Relic, Datadog, etc) for monitoring, logging, and tracing.

- Expert level hands-on knowledge in public cloud platform AWS and/or Google Cloud Platform.

- A professional-level certificate in one of the public clouds is highly desirable.

- Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation.

- Should have used altering systems such as Pager Duty.

- Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services.

- Measurement should have been within a system and across systems in distributed systems.

- Should have supported Production Incidents (PIs) on critical applications of a company.

- Proven experience in handling large-scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms.

- Experience as a service owner in managing large - geographically diverse stakeholders.

- Ability to work with creative - fast-growing engineering teams and motivate them to deliver their best work.

- History of driving innovation.

(ref:hirist.tech)

  • Any Location, India RapidBraiins Full time

    Job Description : We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • Any Location, IN RapidBraiins Full time

    Job Description :We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • Any Location, IN RapidBraiins Full time

    Job Description :We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • Any Location, India RapidBraiins Full time

    Job Description : We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • Any Location/Bangalore, India Codersbrain technology pvt ltd Full time

    Key Responsibilities :- Provide expert production support for application teams utilizing our platform, ensuring high availability, reliability, and performance.- Diagnose and resolve complex issues in production environments, collaborating closely with development teams and stakeholders.- Implement and maintain monitoring, alerting, and logging solutions to...

  • Solution Architect

    2 months ago


    Bangalore/Any Location, India Repletio Full time

    Hasura is looking for an experienced Solutions Architect to work directly with Hasura customers to facilitate the growth and adoption of the product within the organization.GraphQL is changing the way developers and teams build software today. The Hasura GraphQL Engine is an open-source tool that makes it fast and easy to compose a GraphQL API for secure...

  • Solution Architect

    2 weeks ago


    Bangalore/Any Location, IN Repletio Full time

    Hasura is looking for an experienced Solutions Architect to work directly with Hasura customers to facilitate the growth and adoption of the product within the organization.GraphQL is changing the way developers and teams build software today. The Hasura GraphQL Engine is an open-source tool that makes it fast and easy to compose a GraphQL API for secure...

  • Solution Architect

    2 months ago


    Bangalore/Any Location, IN Repletio Full time

    Hasura is looking for an experienced Solutions Architect to work directly with Hasura customers to facilitate the growth and adoption of the product within the organization.GraphQL is changing the way developers and teams build software today. The Hasura GraphQL Engine is an open-source tool that makes it fast and easy to compose a GraphQL API for secure...

  • Solution Architect

    1 week ago


    Bangalore/Any Location, India Repletio Full time

    Hasura is looking for an experienced Solutions Architect to work directly with Hasura customers to facilitate the growth and adoption of the product within the organization.GraphQL is changing the way developers and teams build software today. The Hasura GraphQL Engine is an open-source tool that makes it fast and easy to compose a GraphQL API for secure...


  • bangalore, India Ensono Full time

    About RoleEnsono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • bangalore, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff...


  • bangalore, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 3 - 5 yearsResponsibilities:● Design,...


  • bangalore, India h3 Technologies, LLC Full time

    HiWe are looking for Site Reliablity Engineer (GCP) in Bangalore for one of our reputed client. If you or someone whom you might know is interested then please share resume to JDSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE’s will keep an...


  • bangalore, India ViewSonic Full time

    Job Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.1+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...


  • bangalore, India Ensono Full time

    About Role Ensono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • bangalore, India Kunato Full time

    Site Reliability Engineer (SRE) - Python/GolangJob Description:We are seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our technology team. The ideal candidate will possess strong programming skills with expertise in Python, Golang, or both. This role is pivotal in ensuring the high availability, performance, and security of...


  • bangalore, India Kunato Full time

    Site Reliability Engineer (SRE) - Python/GolangJob Description:We are seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our technology team. The ideal candidate will possess strong programming skills with expertise in Python, Golang, or both. This role is pivotal in ensuring the high availability, performance, and security of...