Site Reliability Engineer

3 weeks ago


India Forbes Advisor Full time

Job Title: SRE

(Certification Mandate) - Certification allowed

AWS Devops professional

AWS Sysops admin

AWS Security specialist

AWS Solution architect Professional

Experience: 8+ Years

Location: Mumbai, Chennai (If strong candidate other location remote will be offered if from mumbai or chennai only hybrid no remote)

Notice period: Immediate to 30 days max

Responsibilities of Senior SRE:

● The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.

● They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.

● They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO's and SLA's.

● They deploy and manage monitoring tools to gain insights on system health and performance.

● They analyze performance, identify bottlenecks and implement solutions to improve a system's scalability and latency durations.

● They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.

● They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.

● They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.

● They forecast resource needs and provision adequately for current and future demand.

● They design and execute "chaos experiments" to test system's failure resiliency.

● They own, define and implement the Disaster Recovery (DR) processes for systems. They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.

● They ensure that security best practices are followed and implemented during design and operations of systems.

● They also own and maintain documentation of processes, playbooks, and systems.

● They publish KPI reports and other system health updates on a regular basis to the business.

Requirements:

○ Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience

○ Must-have - 12+ years of overall IT experience

Must-have - 7+ years of proven work experience as a Senior Site Reliability Engineer or a similar position.

○ Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.

○ Must-have - AWS experience - 3+ years' experience with using a broad range of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.

○ Must-have - 2+ years of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.

○ Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes)

○ Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.

○ Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.

○ Experience managing cloud network resources (AWS Preferred) such as CloudWatch, VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points. ○ Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.

○ Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ

○ Experience with configuration automation tools like Puppet/Ansible/Chef/Salt

○ Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills.

○ Operating Systems: Windows and Linux system administration.

○ Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues

○ Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have

● Experience with Terraform/Ansible/Chef/Puppet

● Experience with GitHub Actions

● Experience with CloudFront, Fastly



  • India iVedha Inc. Full time

    Site Reliability Engineer (SRE) Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python , advanced proficiency in Azure-based infrastructure , and significant experience in Customer Reliability...


  • India Burgeon It Services Pvt Ltd Full time

    Position : Site Reliability Engineer Location : PAN INDIA Location Duration : C2H Exp : 5 - 8 Years JOB DESCRIPTION : - Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role. - Experience with cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes). - Maintain the stability of the...


  • india CorroHealth Full time

    Hiring Alert!!!We are looking for highly skilled Site Reliability Engineer (SRE) for our Product Development team based out at Noida Location!!!Only Immediate Joiners preferred!!Candidates who are available for F2F round of interview, can only apply!!Job DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal...


  • india, india BigRio Full time

    Job Title: Site Reliability Engineer Location: Remote with Quarterly visits to Chennai, Tamil Nadu, India Duration: Full-Time About BigRio: BigRio is a remote-based, technology consulting firm headquartered in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI...


  • india, india BigRio Full time

    Job Title: Site Reliability Engineer Location: Remote with Quarterly visits to Chennai, Tamil Nadu, India Duration: Full-Time About BigRio: BigRio is a remote-based, technology consulting firm headquartered in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI...


  • India Burgeon It Services Pvt Ltd Full time

    Position : Site Reliability EngineerLocation : PAN INDIA LocationDuration : C2HExp : 5 - 8 YearsJOB DESCRIPTION : - Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role. - Experience with cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes). - Maintain the stability of the software...


  • india Coforge Full time

    Job Title: Site Reliability Engineer Skills : SRE, CI/CD, AWS, Python, Terraform & Kubernetes Location: Hyderabad (Work from Office) Experience: 6-14 Years Note: Immediate joiners are preferable Job Description: We at Coforge are hiring a Site Reliability Engineer with the following skillset: Design, implement, and manage scalable and secure cloud-based...


  • india Coforge Full time

    Job Title: Site Reliability EngineerSkills: SRE, CI/CD, AWS, Python, Terraform & KubernetesLocation: Hyderabad (Work from Office)Experience: 6-14 YearsNote: Immediate joiners are preferableJob Description:We at Coforge are hiring a Site Reliability Engineer with the following skillset:- Design, implement, and manage scalable and secure cloud-based...


  • india SG Analytics Full time

    Job Overview:We are looking for an experiencedSite Reliability Engineer (SRE)to join the infrastructure team of our client. This role will focus on ensuring the reliability and performance of our systems while automating deployment processes and improving the overall operational efficiency. The ideal candidate will have hands-on experience with GCP services...


  • india NationsBenefits Full time

    Position Overview:TheSite Reliability Engineering(SRE) team plays a critical role in maintaining the health, performance, and availability of our platforms. As anL2 SRE , you will monitor and respond to site performance metrics, manage incidents, and work closely with Development, , and Engineering teams to ensure the continuous reliability of our services....


  • India Awign Full time

    About Awign Expert : Awign Expert is an enterprise-focused platform that helps businesses Hire, Assess and Manage highly skilled resources for Gig Based Projects. We provide our Experts a gateway to work for and build a freelance/consulting career with large-scale Enterprises. We are a newly launched business division of Awign, which is one of the pioneers...


  • india iO Associates - UKEU Full time

    Job Title: Site Reliability EngineerLocation: Bangalore (Hybrid)Duration: 6-month ContractMy client is a CMMI Level 3 certified IT firm, specializes in business process automation and software development. With 100+ experts, they deliver high-quality solutions globally, leveraging cutting-edge technologies and Agile methodologies.They are currently looking...


  • india Tata Consultancy Services Full time

    Role: GCP Site Reliability Engineer Must Have Skills : Kubernetes, Terraform, Airflow, Jenkin Deploying, automating, maintaining and managing cloud-based production system, to ensure the availability, performance, scalability and security of productions systems. Experience working on Application build and deployment with Kubernetes and Terraform Experience...


  • india Coforge Full time

    Job Title:Site Reliability EngineerSkills : SRE, CI/CD, AWS, Python, Terraform & KubernetesLocation:Hyderabad (Work from Office)Experience:6-14 YearsNote:Immediate joiners are preferableJob Description:We at Coforge are hiring a Site Reliability Engineer with the following skillset:Design, implement, and manage scalable and secure cloud-based infrastructure...


  • india, india iVedha Inc. Full time

    Site Reliability Engineer (SRE)   Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer Reliability Engineering (CRE) and Automation. The ideal candidate will have 3 to 5 years of...


  • India IVedha Inc. Full time

    Site Reliability Engineer (SRE) Level 3 with CRE and Automation Expertise Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) Level 3 with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer Reliability Engineering (CRE) and Automation.The...


  • india, india iVedha Inc. Full time

    Site Reliability Engineer (SRE)   //**Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model **// Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer...


  • india, india iVedha Inc. Full time

    Site Reliability Engineer (SRE)   //**Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model **// Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer...


  • India iVedha Inc. Full time

    Site Reliability Engineer (SRE)//**Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model**//Position Overview:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer Reliability...


  • india 10decoders Full time

    JD: Site Reliability Engineer -GCP With TerraformThe Role:We are looking for a Senior SRE with5+ yearsof experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure onGoogle Cloud with Terraformand have strongexperience running workloads that scale on Google’s Kubernetes...