Site Reliability Engineer/Architect

1 month ago


india Grizmo Labs Full time

Responsibilities :

- Own the Infrastructure, and APM and work with Developers and Systems engineers to Build, Release, Monitor, and run the reliability of the service exceeding the agreed SLAs.

- Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go, and Python.

- Write automation to reduce toil and eliminate manual, repeatable tasks.

- Work with Ansible, Puppet, Chef, Terraform, or another config management/orchestration suite, know where it's broken, work toward fixing them, and explore new alternatives.

- Define and accelerate the implementation of support processes, tools, and best practices Maintain services once they are live by measuring and monitoring availability, latency, and overall system reliability.

- Handle cross-team performance issues from identification of the cause, to determining the areas of improvement and driving those actions to closure.

- Performance and maturity baselining of Systems, tools maturity, coverage, metrics, technology, and engineering practices.

- Define, Measure, and Improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline - automate release management.

- Build dashboards to provide visibility into the performance of the applications.

- Create chaos in the production environment purposefully in a controlled manager to validate the reliability of systems.

- Mentor and coach other SREs in the organization.

- Provide written and verbal updates to executives and the stakeholders of the application in the organization.

- Understand the current process, and system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.

- Troubleshoot, debug and diagnose operational issues and drive them to closure.

- Understanding of software delivery life cycles, particularly Agile/Lean, and DevOps.

Requirements :

- A strong believer in automation to bring in sustained continuous improvement by automating Toil, and Runbooks, improving the ability of the applications to auto-heal leading to improved reliability.

- 15+ years of experience in the Development and Operations of applications/services in production that have uptime over 99.9%.

- 8+ years of experience as a SRE in handling web-scale applications.

- Strong hands-on coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc.

- Good understanding of Observability (monitoring, logging, tracing, metrics) and chaos engineering concepts.

- Proficiency in using Observability tools (for example : New Relic, Datadog, etc) for monitoring, logging, and tracing.

- Expert level hands-on knowledge in public cloud platform AWS and/or Google Cloud Platform.

- A professional-level certificate in one of the public clouds is highly desirable.

- Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation.

- Should have used altering systems such as Pager Duty.

- Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services.

- Measurement should have been within a system and across systems in distributed systems.

- Should have supported Production Incidents (PIs) on critical applications of a company.

- Proven experience in handling large-scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms.

- Experience as a service owner in managing large - geographically diverse stakeholders.

- Ability to work with creative - fast-growing engineering teams and motivate them to deliver their best work.

- History of driving innovation.

(ref:hirist.tech)

  • india Encora Inc. Full time

    Description Sr. Software Engineer (Site Reliability Engineer) Important Information Location: Ahmedabad Experience: 5+ years Job Mode: Full-time Work Mode: Remote Job Summary Working with DevOps SRE with good experience in Site Reliability Engineer. Responsibilities and Duties Design, implement, and maintain highly...


  • india Cricbuzz.com Full time

    Site Reliability Engineer We are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services. Experience - 3 - 5 years Responsibilities: ●...


  • india Korn Ferry Full time

    Role - Site Reliability Engineer Exp - 5+ years Required Location - Hyderabad ( Work from Office-Hybrid) Shift Timings - 5AM -1 PM IST We are looking for a Site Reliability Engineer with strong development background to join our team. In this role, you will be responsible for ensuring the reliability and performance of our systems. You will work closely...


  • india ViewSonic Full time

    Job Requirements: Bachelor’s degree in computer science, Engineering, or a related field. 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Proficient in AWS solutions including but not limited to EC2, S3, CloudWatch, Lambda, and RDS. Strong understanding of Platform Engineering concepts and principles. Experience...


  • india SID Global Solutions Full time

    Dear Candidates, We are looking for immediate joiners 8 to 9 years for Hyderabad Location for a talented Site Reliability Engineer-Manager to join our dynamic team and contribute to the development of our cutting-edge web applications. If you're passionate about the role and have experience in SRE, GCP and Kubernetes , send me your updated cv : Please...


  • india Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer Job Description: Summary: As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • india First American (India) Full time

    The Role: A SRE Manager is ultimately responsible for system reliability, developer productivity and reducing time to market by striving to reduce technical debt of the services your SRE team supports. We seek managers who are passionate about site reliability to influence and drive the strategic SRE mission. As a Site Reliability Engineering Manager...


  • india System Soft Technologies Full time

    Title: Site Reliability Engineer 100% REMOTE The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • India System Soft Technologies Full time

    Title: Site Reliability Engineer100% REMOTEThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • india Career Stone Consultant Full time

    PRINCIPAL ACCOUNTABILITIES: 1.AWS Infrastructure Design: o Lead the design and implementation of scalable, reliable, and secure AWS infrastructure. o Provide expertise in architecting solutions that maximize the benefits of AWS services. o Lead the upgrade of Apache web servers for improved performance and security. o Oversee the database (DB) upgrade...


  • india Thoucentric Full time

    Job Description Job Description:We are seeking a skilled and dedicated Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, performance, and scalability of our systems and applications. This role combines software development and systems engineering to build and run large-scale, distributed,...


  • india WaferWire Cloud Technologies Full time

    Role: SRE (Site Reliability Engineer) Experience: 4+ Years About WaferWire Cloud Technologies: WaferWire Cloud Technologies is a leading provider of innovative cloud solutions aimed at transforming businesses and driving digital growth. With a focus on cutting-edge technology and customer-centric approaches, we empower organizations to thrive in the...


  • India System Soft Technologies Full time

    Job Summary The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams....


  • India System Soft Technologies Full time

    Job SummaryThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams....


  • india Ford Motor Company Full time

    Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.  Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and...


  • india Greenway Health Full time

    Job Description Job Summary The Manager is responsible for implementing the development process and site reliability engineering practices to resolve issues and identify opportunity areas. This role will lead development and site reliability engineering teams and establish and implement best practices and standards related to engineering...


  • india Next-Link Full time

    Job Description Senior Site Reliability Engineer Desirable Skills:Experience with additional programming languages and technologies beyond Python and Ruby.Familiarity with cloud platforms such as AWS, Azure, or GCP.Proficiency in additional logging and monitoring tools.Experience with other Infrastructure as Code (IaC) tools and practices.Knowledge of...


  • india STAFIDE Full time

    Job Description About us: Stafide is the premier destination for tech talent consulting, providing comprehensive employment services throughout Europe. Our mission is straightforward: to effortlessly connect job seekers with employers, focusing on the rapidly changing technology sector. Boasting unparalleled expertise and a steadfast commitment, we...


  • india HCLSoftware Full time

    The Role: HCL BigFix is looking for a Site Reliability Engineer to work on infrastructure for a new product that will help keep our customers’ end points secure. You will be a part of a team that leverages modern technological solutions to drive growth and efficiency. Your daily responsibilities will be centered on HCL BigFix’s cloud infrastructure,...


  • india UBS Full time

    Your role We're looking for a Site Reliability Engineer to:• work as a part of an agile pod (team)• determine the reliability of our digital products, technology services, and the infrastructure that underpins them• minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or...