Site Reliability Engineer

4 weeks ago


Chennai, India Talent500 Full time

Short Description:
A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in ‘Always ON‘ aspect for the Ford‘s e-Commerce Platform
Description for Internal Candidates
Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation
Performing root cause analysis of production incidents and implementing preventive measures
Responsibilities for Internal Candidates
Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.
Run the production environment by monitoring availability and taking a holistic view of system health.
Developing, improving, and operating the deployment and orchestration of a complex distributed system
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational and engineering Support for multiple large, distributed software applications
Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation
Collaborating with development teams to design, build, and operate scalable and resilient software systems
Automating deployment, monitoring, and incident response processes
Performing root cause analysis of production incidents and implementing preventive measures
Conducting performance analysis and optimization of the system
Ensuring compliance with security and regulatory standards
Implementing and maintaining disaster recovery processes
Providing technical guidance and mentorship to other team members
Participating in an on-call rotation for incident response and support
.
Qualifications:
4 Year College Degree in Computer Science or Equivalent.
2-5 years’ experience with JAVA, J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in Maintenance and Development of multi-tier applications.
Understanding of RESTful APIs and microservices platform
2-5 Years of experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.
Strong experience with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.
Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.
Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans
Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.
Architect, design & develop automation to reduce toil, improve recoverability, availability, latency & scalability of supported applications with understanding of MTTD (Mean Time to Detection) & MTTR (Mean Time to Resolution)
Maintain knowledge repository that includes Standard operating procedure, Release checklists, Runbooks for incident recovery Same Posting Description for Internal and External Candidates



  • Chennai, India ZF Group Full time

     Req ID 65124 SDC Chennai, India       Your Tasks   7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applications Experience with infrastructure & service monitoring and alerting Experience with application observability Experience...


  • Chennai, Tamil Nadu, India ZF Group Full time

    Req ID 65124 SDC Chennai, India Your Tasks 7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform, CI/CD...


  • Chennai, Tamil Nadu, India ZF Group Full time

    Req ID 65124 SDC Chennai, India Your Tasks 7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform, CI/CD...


  • Chennai, India ZF Group Full time

     Req ID 65124 SDC Chennai, India       Your Tasks   7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applications Experience with infrastructure & service monitoring and alerting Experience with application observability Experience...


  • Chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Strong expertise in Azure cloud services and solutions. Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools. Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible. Familiarity with CI/CD pipelines...


  • Chennai, India Vistex Full time

    Vistex is currently hiring a Site Reliability Engineer. The Vistex Site Reliability Engineer will be primarily responsible for service availability, performance, monitoring, incident response, and capacity planning. This is a highly technical, hands-on role with a strong focus on automation, accurate monitoring, actionable alerting, resilient design,...


  • chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Strong expertise in Azure cloud services and solutions. Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools. Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible. Familiarity with CI/CD pipelines...


  • Chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Strong expertise in Azure cloud services and solutions.Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools.Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible.Familiarity with CI/CD pipelines and...


  • Chennai, India iLink Digital Full time

    7 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Strong expertise in Azure cloud services and solutions.Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools.Experience with infrastructure as code (IaC) tools such as ARM templates, Terraform, or Ansible.Familiarity with CI/CD pipelines and...


  • chennai, India TERRAGIG LLP Full time

    Role : Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.-...


  • Chennai, India ZF Group Full time

    Req ID SDC Chennai, IndiaYour Tasks7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform, CI/CD...


  • Chennai, India Ford Business Solutions Full time

    Short Description: A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal CandidatesStrong background in software development and systems...


  • chennai, India Corpxcel Consulting Full time

    Job Description : - For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description : - For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description : - For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • chennai, India ZF Group Full time

    Req ID SDC Chennai, India Your Tasks 7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applicationsExperience with infrastructure & service monitoring and alertingExperience with application observability Experience with Kafka, Terraform,...