Site Reliability Engineer

3 months ago


Chennai, India Talent500 Full time

Position Title:

Senior Engineer, Site Reliability Engineering


ROLE DESCRIPTION AND SCOPE

Role:

As a Senior Site Reliability Engineer at Ford Motor Company, you will play a pivotal role in elevating the performance and dependability of our eCommerce platforms and applications. In this essential position, your responsibilities will include closely collaborating with diverse teams across the organization to fortify our online systems, ensuring they are not only robust and scalable but also equipped to efficiently manage the complexities of a global customer base. Your expertise in site reliability will be crucial in driving ongoing enhancements to our technology landscape. This continuous improvement effort is vital to maintaining Ford’s leadership in innovation within the automotive industry, helping us set standards in digital commerce and customer satisfaction. Your contributions will directly impact the smooth operation and evolutionary growth of our eCommerce capabilities, aligning with Ford's commitment to excellence and innovation.


KEY RESPONSIBILITIES / DELIVERABLES:

As a Site Reliability Engineer, your responsibilities will include:

  • Participating in 24x7 on-call production support rotations and handling incident response to minimize disruptions.
  • Continuously monitoring the availability, reliability, and performance of systems, platforms, and applications, maintaining a holistic view of system health.
  • Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.
  • Providing primary operational and engineering support for multiple large, distributed software applications.
  • Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans.
  • Using automation tools, scripts, and processes to reduce or eliminate repetitive tasks, thereby improving the support provided by Site Reliability Engineering.
  • Creating or modifying terraform files according to Ford formats to develop new monitoring dashboards and alert policies.
  • Collaborating with engineering and architecture teams to evaluate and identify optimal cloud solutions, focusing on scalability, high-performance, and security.
  • Gathering and analyzing metrics from operating systems and applications to assist in performance tuning and fault finding.
  • Measuring and optimizing system performance continuously to exceed customer needs and advance capabilities.
  • Troubleshooting and resolving issues related to full stack websites, cloud platforms, and infrastructure.
  • Working closely with developers, testers, and business stakeholders to ensure the delivery of high-quality solutions, balancing feature development speed and reliability with well-defined service-level objectives.
  • Ensuring compliance with security and regulatory standards, implementing and maintaining disaster recovery processes.
  • Providing technical guidance and mentorship to other team members.
  • These responsibilities ensure the stability, efficiency, and continuous improvement of Ford Motor Company’s eCommerce solutions, aligning with the organization's high standards and innovative approach.


EXPERIENCES / COMPETENCIES:

Education Qualification:

  • Bachelor’s or Equivalent


Number of Years of Experience:

  • 4+ years SRE experience


Leadership Skills and Personality Traits:

  • Ability to work effectively in a remote/virtual work setting with other global team members.
  • Effectively work with cross-functional teams across the organization – inside and outside of the technology and software organization
  • Ability to dissect problems and explore them from different angles to find the most efficient solutions.
  • Staying composed under pressure and bouncing back from setbacks quickly, maintaining focus on achieving system reliability.
  • Keen attention to specifics to catch and address small issues before they escalate into larger problems.
  • A strong desire to understand how things work and a willingness to explore and implement new technologies and methodologies.
  • Flexibility in handling unexpected challenges and changes in technology or project directions.
  • Taking initiative to prevent problems before they occur and continuously seeking improvements in system performance.
  • Confidence and ability to make quick decisions during critical situations to prevent or minimize disruptions.
  • Understanding and considering team members’ perspectives and challenges, fostering a supportive and inclusive environment.
  • Clear and effective communication skills, capable of conveying complex information in a straightforward manner and engaging with both technical and non-technical stakeholders.
  • Taking responsibility for the systems and the team, ensuring reliability, and being accountable for the outcomes.
  • Commitment to the development of team members, providing guidance and feedback to help them grow in their professional capacities.
  • Encouraging a collaborative team environment where ideas and solutions are shared openly and where each member’s contribution is valued.
  • Motivating the team to strive for excellence, pushing the boundaries of what is possible, and inspiring innovation through leadership.


Functional/Technical Skills:

  • 5 - 6 years’ experience with JAVA, J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in Maintenance and Development of multi-tier applications.
  • Understanding of RESTful APIs and microservices platform
  • 4 - 5 Years of experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.
  • Strong experience with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.
  • Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.
  • Architect, design & develop automation experience to reduce toil, improve recoverability, availability, latency & scalability of supported applications with understanding of MTTD (Mean Time to Detection) & MTTR (Mean Time to Resolution)
  • Ability to quickly diagnose and resolve issues in high-pressure situations.
  • Strong verbal and written communication skills to effectively collaborate with cross-functional teams and articulate technical concepts to non-technical stakeholders.
  • Experience in leading teams, mentoring junior staff, and promoting a culture of continuous improvement and learning.
  • Ability to analyze complex data to improve system performance and predict future challenges.
  • Experience in handling outages and the ability to lead incident response efforts, minimizing impact on services.
  • Understanding of network architecture, protocols, and security practices to ensure robust and secure systems.
  • Skills/understanding of performance tuning and optimization of systems and applications.
  • Knowledge of database administration and management, particularly in configuring, managing, and scaling databases.
  • Experience in planning and executing disaster recovery strategies to ensure data integrity and availability.


Travel:

  • As needed and flexible


Other Preferred:

N/A



  • Chennai, India ZF Group Full time

    Req ID 73230 | SDC Chennai, India, ZF Commercial Vehicle Control Systems India Limited      Job Title: Site Reliability Engineer   Job Description:   7-9 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based...


  • chennai, India Anicalls (Pty) Ltd Full time

    The RoleMentor teammates on SRE best practices and guide technical direction Work closely with the product engineering team to rapidly deliver capabilitiesAutomate and optimize developer pipelinesBuild monitoring to assess system and pipeline healthQualifications:Proficiency in Python, Go, Ruby, or Java is a plusExpertise in Linux administration,...


  • chennai, India Athenahealth Full time

    Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. We are looking for a Senior Site Reliability Engineer to join our Service Operations, Site Reliability Engineering team within the Cloud Infrastructure Engineering division. This team is newly formed and is responsible for managing...


  • chennai, India Athenahealth Full time

    Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. We are looking for a Senior Site Reliability Engineer to join our Service Operations, Site Reliability Engineering team within the Cloud Infrastructure Engineering division. This team is newly formed and is responsible for managing...


  • Chennai, India noon Full time

    Job Description- Site Reliability EngineerAbout noon noon.com is a technology leader with a simple mission: to be the best place to buy and sell things. In doing this we hope to accelerate the digital economy of the Middle East, empowering regional talent and businesses to meet the full range of consumers' online needs.noon operates without boundaries; we...


  • Chennai, India Talent500 Full time

    Position Title:Senior Engineer, Site Reliability EngineeringROLE DESCRIPTION AND SCOPERole:As a Senior Site Reliability Engineer at Ford Motor Company, you will play a pivotal role in elevating the performance and dependability of our eCommerce platforms and applications. In this essential position, your responsibilities will include closely collaborating...


  • Chennai, India RELX India (Pvt) Ltd Risk div Company Full time

    Senior Site Reliability Engineer I   Would like to be part of Collaborative and friendly team?   Would you like to be part of a rewarding project?  


  • Chennai, India Athenahealth Full time

    Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. We are looking for a Senior Site Reliability Engineer to join our Service Operations, Site Reliability Engineering team within the Cloud Infrastructure Engineering division. This team is newly formed and is responsible for managing...


  • Chennai, India Encora Inc. Full time

    Important Information Experience: 6 to 8 years Job Location: Chennai Position Type: Full time. Work Mode- Hybrid (3 days in office)  Principal Site Reliability Engineer About the Opportunity:   The Principal Site Reliability Engineer is vital in our Site Reliability Engineering team. As the technical leader at the Center for...


  • chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together.What we are looking forRole: SRE Admin(Site reliability Engineer)Experience Range: 5 – 7 YearsLocation: ChennaiMust Have:5+ years of experience in DevOps or SRE roles in an...


  • Chennai, India Talent500 Full time

    Position Title: Senior Engineer, Site Reliability Engineering ROLE DESCRIPTION AND SCOPE Role: As a Senior Site Reliability Engineer at Ford Motor Company, you will play a pivotal role in elevating the performance and dependability of our eCommerce platforms and applications. In this essential position, your responsibilities will include closely...


  • chennai, India Talent500 Full time

    Position Title: Senior Engineer, Site Reliability Engineering ROLE DESCRIPTION AND SCOPE Role: As a Senior Site Reliability Engineer at Ford Motor Company, you will play a pivotal role in elevating the performance and dependability of our eCommerce platforms and applications. In this essential position, your responsibilities will include closely...


  • Chennai, India Talent500 Full time

    Position Title:Senior Engineer, Site Reliability EngineeringROLE DESCRIPTION AND SCOPERole: As a Senior Site Reliability Engineer at Ford Motor Company, you will play a pivotal role in elevating the performance and dependability of our eCommerce platforms and applications. In this essential position, your responsibilities will include closely collaborating...


  • Chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together.What we are looking forRole: SRE Admin(Site reliability Engineer)Experience Range: 5 – 7 YearsLocation: ChennaiMust Have:5+ years of experience in DevOps or SRE roles in an...


  • Chennai, India FOURKITES Full time

    At FourKites we have the opportunity to tackle complex challenges with real-world impacts. Whether it’s medical supplies from Cardinal Health or groceries for Walmart, the FourKites platform helps customers operate global supply chains that are efficient, agile and sustainable. Join a team of curious problem solvers that celebrates differences, leads...


  • Chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together.What we are looking forRole: SRE Admin(Site reliability Engineer)Experience Range: 5 – 7 YearsLocation: ChennaiMust Have:5+ years of experience in DevOps or SRE roles in an...


  • Chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together.What we are looking forRole: SRE Admin(Site reliability Engineer)Experience Range: 5 – 7 YearsLocation: ChennaiMust Have:5+ years of experience in DevOps or SRE roles in an...


  • chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: SRE Admin(Site reliability Engineer) Experience Range: 5 – 7 Years Location: Chennai Must Have: 5+ years of experience in DevOps or SRE...


  • chennai, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: SRE Admin(Site reliability Engineer) Experience Range: 5 – 7 Years Location: Chennai Must Have: 5+ years of experience in DevOps or SRE...


  • chennai, India RELX India (Pvt) Ltd Risk div Company Full time

    About the Role   You will play a crucial role in ensuring the reliability, scalability, and performance of our systems. Your expertise will drive the implementation of best practices, automation, and monitoring to maintain high availability and minimize downtime. Join us in optimizing our infrastructure for seamless operations.  Responsibilities ...