Cloud Site Reliability Engineer:(Azure)

3 weeks ago


India TekJobs Full time

Responsibilities:


- Monitor alerts, metrics, and logs to detect incidents, and events and correlate them to find the root cause of outages.

- Conduct Post-Incident Review with various roles including developers, infrastructure engineers, product owners, system owners, and information security to identify the cause and solution through automation to improve the agility, and performance of the system.

- Work with other SREs to drive standards and consistency around best practices

- Create, and modify runbooks and knowledge base which can be used by other engineers to follow and resolve incidents quickly. Identify opportunities and implement the automation needed to address and prevent operational issues.

- Ability to understand and modify existing code, and scripts used for automation to build applications and infrastructure. Identify and enable new alerts and monitors for critical services impacting system reliability.

- Drive increased efficiency across the teams, eliminating duplication, leveraging common DevOps processes, tools, and technology

- Collaborate with team in defining architecture; identify potential risks to successful implementation

- Work closely with business partners and software development teams in a matrix organization structure

- Automate tasks to reduce manual work, reduce outages, and enhance customer and employee experience

- Communicate and resolve complex production issues and implement preventative measures

Implement and tune monitoring, metric collection, and alerting

- Identify opportunities and implement the automation needed to address and prevent operational issues



Required Skills:


- Solid hands-on experience in setting up and correlating SIEM Monitoring Tools including but limited to Azure Sentinel, Azure Log Analytics, Azure Monitor, Application Insights, Splunk, Moogsoft, CA APM/Wily Introscope, etc.


(OR)


- Senior Software developer in developing applications using tools such as Java, Spring Boot, Spring Framework, .NET Core, Angular, React, Vue.js

- Hands-on experience with a variety of database technologies including relational databases such as Azure SQL, SQL Server, MySQL, or NoSQL databases such as Azure Cosmos DB, MongoDB, Postgres SQL, etc.

- Hand-on experience integrating systems with REST APIs, Databases(RDBMS), LDAP, Active Directory, Azure Active Directory, RabbitMQ, Redis Cache, Azure Functions (Serverless)

- Hands-on experience in deploying applications to Production through automated CI/CD pipelines or automated scripts using tools such as Maven, Gradle, Docker, Git, JUnit, MSTest, Tomcat, SonarQube, Fortify, Selenium, Cucumber, Contrast Security, etc

- Understanding and experience delivering Twelve-Factor cloud-native applications

- Understanding and experience with Microservices architecture

- Knowledge, understanding, and experience using ticketing systems for Catalogs and Change Management like ServiceNow, HP ITSM, and BMC Remedy.

- Excellent communication and coordination skills to interact with different stakeholders who are technical and non-technical.


Preferred Skills:

------------------

- Knowledge, understanding and experience of DevOps, Agile Methodologies

- Experience in Microsoft Azure Technologies

- Experience in Tanzu Application/Container Services (TAS/TKS) (Previously Pivotal Cloud Foundry) or equivalent container based platforms/products like Openshift, Azure Kubernetes Services, Google Container Services etc.

- Experience using ServiceNow ITOM and ITSM to create catalogs or to automate processes by integrating with other systems.

- We highly encourage SREs, DevOps, Application Developers, System developers, System Engineers who have knowledge and understanding of how software is built and managed.



  • india Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer Job Description: Summary: As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • india TekJobs Full time

    Responsibilities: - Monitor alerts, metrics, and logs to detect incidents, and events and correlate them to find the root cause of outages. - Conduct Post-Incident Review with various roles including developers, infrastructure engineers, product owners, system owners, and information security to identify the cause and solution through automation to improve...


  • india Cricbuzz.com Full time

    Site Reliability Engineer We are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services. Experience - 3 - 5 years Responsibilities: ●...


  • india Coforge Full time

    Qualifications : Experience in a DevOps / Site Reliability Engineer ( SRE ) position, dedicated to ensuring the high availability, reliability, and scalability of live systems. Proficient in observability tools like Prometheus, ELK stack, Grafana, and Azure Monitor, capable of fully managing the suite for optimal system oversight. Skilled in operating APM...


  • india iScale Solutions Full time

    Job Description This is a remote position. Key Responsibilities: Design, implement, and maintain highly available and scalable infrastructure on AWS cloud platform. Develop and manage Infrastructure as Code (IaC) using Terraform for provisioning and managing cloud resources. Implement containerization strategies using Docker for packaging and deploying...


  • India System Soft Technologies Full time

    Title: Site Reliability Engineer100% REMOTEThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • india System Soft Technologies Full time

    Title: Site Reliability Engineer 100% REMOTE The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and...


  • india CloudBees Full time

    Job Title - Manager, Site Reliability Engineer Location - Bangalore and Chennai Year of Experience - 10+ Years About CloudBees CloudBees is the leading software delivery platform that enables enterprises to deliver scalable, compliant, and secure software, empowering developers to do their best work. Seamlessly integrating into any hybrid and...


  • india LTIMindtree Full time

    We are Hiring DevOps Site Reliability Engineer !!! Exp - 8 to 12 years Location - Pune Banglore & Mumbai NP - Immediate to 60 days JD 5+ years of experience in DevOps, Site Reliability Engineer, or as a developer in SaaS based/enterprise applications • Previous experience within Agile Development or Systems Engineering / automation role • Development...


  • india Encora Inc. Full time

    Description Sr. Software Engineer (Site Reliability Engineer) Important Information Location: Ahmedabad Experience: 5+ years Job Mode: Full-time Work Mode: Remote Job Summary Working with DevOps SRE with good experience in Site Reliability Engineer. Responsibilities and Duties Design, implement, and maintain highly...


  • india Agensi Pekerjaan BTC Sdn Bhd Full time

    Job Description Open Position: Site Reliability Engineer (MNC Tech Company)  A well-known MNC Tech Company is hiring Site Reliability Engineer to join them in the Kuala Lumpur office.  Key responsibilities include: Develop and provide operational support for full-stack software applicationsCollaborate with development operations staff to create, monitor,...


  • Bangalore/Anywhere in India/Multiple Locations One of the Consulting Firms Full time

    Job Description : - Collaborate with Site Reliability Engineering teammates and Software Delivery teams to determine and implement cloud networking, monitoring, and infrastructure requirements- Ensure that networks and infrastructure are highly available- Develop methodologies to safely deploy and test network and infrastructure changes, including...


  • Anywhere in India/Multiple Locations Innoquest Consulting Full time

    MANDATORY ASK : 5-8 YEARS RELEVANT EXPERIENCE / STRONG HANDS-ON EXPERIENCE IN ANSIBLE & TERRAFORM / EXPERTISE IN AUTOMATION, DEBUGGING, SCRIPTING TOOLS, APM OR MONITORING TOOLS / EXPERIENCE IN SITE RELIABILITY & CLOUD (PREFERABLY AZURE) / EXPERIENCE IN CONTAINERIZATION USING DOCKER & KUBERNETES. JOB OVERVIEW : As a member of the Platform Engineering...


  • india RapidBraiins Full time

    Job Description : We are seeking a highly skilled and experienced Senior DevOps Site Reliability Engineer to join our dynamic team. The ideal candidate will have a proven track record of success in DevOps, Site Reliability Engineering (SRE), or development roles within SaaS-based or enterprise applications. As a Senior DevOps SRE Engineer, you will play a...


  • india Codersbrain technology pvt ltd Full time

    Key Responsibilities :- Provide expert production support for application teams utilizing our platform, ensuring high availability, reliability, and performance.- Diagnose and resolve complex issues in production environments, collaborating closely with development teams and stakeholders.- Implement and maintain monitoring, alerting, and logging solutions to...


  • india EZINFORMATICS SOLUTIONS PVT LTD Full time

    Company Description EZINFORMATICS SOLUTIONS PVT LTD is a team of professionals with vast industrial experience and accomplishments in various IT services. They focus on three different spheres: Cyber Security, Information Technology, and Consulting Services. Their goal is to provide safe and secure solutions, unify customer data, and deliver exceptional...


  • India System Soft Technologies Full time

    Job SummaryThe Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams....


  • india Greenway Health Full time

    Job Description Job Summary The Manager is responsible for implementing the development process and site reliability engineering practices to resolve issues and identify opportunity areas. This role will lead development and site reliability engineering teams and establish and implement best practices and standards related to engineering...


  • india Oracle Full time

    Oracle Health and AI Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Health while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed...


  • india Career Stone Consultant Full time

    PRINCIPAL ACCOUNTABILITIES: 1.AWS Infrastructure Design: o Lead the design and implementation of scalable, reliable, and secure AWS infrastructure. o Provide expertise in architecting solutions that maximize the benefits of AWS services. o Lead the upgrade of Apache web servers for improved performance and security. o Oversee the database (DB) upgrade...