Site Reliability Engineer

1 day ago


Chennai, Tamil Nadu, India Ford Motor Full time

SRE - Software Engineer

Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical skills to help create vehicles that are as smart as you are.

Ford is seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform.

Our Site Reliability Engineering (SRE) team enables modernization by providing robust SRE standards, IaC, monitoring tools powered by AI and easy-to-use dashboards. The resulting transparency of end-to-end performance provides a better view into how teams can proactively manage reliability and strategically apply automation.

As an SRE your role will combine software engineering and systems engineering disciplines to ensure that software systems are available, scalable, and maintainable. This individual will play a pivotal role in shaping the evolving needs of our customers including development of Service Level Indicators and Objectives (SLI/SLO), best practices with associated templates, as well as automation to remove toil and facilitate adoption.

Responsibilities:

  • Partner with and guide development teams, product managers, and other IT professionals in SRE best practices to improve reliability, MTTR/MTTD, quality, and time-to-market of our suite of software solutions across Ford
  • Collaborate with development teams as a full-stack software engineer to design, build, and operate scalable and resilient software systems
  • Guide partner teams in setting appropriate SLOs, leveraging distributed tracing, developing effective dashboards and custom metrics etc.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve our resilience as an enterprise
  • Identify, reduce, and eliminate TOIL via automation to maximize our partner development teams' time spent on engineering and innovation
  • Perform root cause analysis of production incidents and implementing preventive measures
  • Enable/guide partner teams to regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, capacity & resource utilization.
  • Enable Partner teams to develop resilient back-end, front-end, business logic, data tier and integration tier, along with testing, CICD, monitoring, agile processes, and programming fundamentals.
  • Maintain knowledge repository that includes standard operating procedures, SRE best practices & guides, release checklists, etc.

Provide technical guidance and mentorship to other team members

Qualifications

  • Bachelor's degree in computer science, Computer Engineering, Electrical Engineering or related field or a combination of education and equivalent work experience
  • 5 years of experience with Golang, Python, Java, NoSQL/SQL Datastore, Spring Boot.
  • 5 years of experience with any APM and other monitoring tools such as Grafana Cloud, Dynatrace, New Relic, ELK, Splunk, Prometheus, Kafka, DataDog, PagerDuty.
  • 3 years of GCP, AWS, or Azure experience.
  • 3 years of experience maintaining, developing, and supporting multi-tier production applications
  • Experience with automated testing, unit/integration/load and/or test-driven development
  • Understanding of gRPC & RESTful APIs, and microservices platform
  • Strong experience with establishing error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.
  • Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.
  • Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.

Additional Preferred Qualifications

  • Experience with cloud platforms such as AWS, Google Cloud, or Azure
  • Experience with data visualization tools such as, Alteryx, Tableau, Power BI and Qlik-Sense
  • Familiarity with DevSecOps practices and integrating security into CI/CD pipelines
  • Experience with SCA, SAST, DAST, Vulnerability Management, and CSPM tools to assist customers deliver secure services
  • Proficiency in CI/CD and DevOps / GitOps practices
  • Experience with GCP cloud services
  • Demonstrable experience as a Site Reliability Engineer or similar role
  • SRE Certification(s) is a plus
  • Kubernetes experience is a plus

You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply

Experience LevelSenior Level

  • Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Be at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...


  • Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Hiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...


  • Chennai, Tamil Nadu, India NatWest Group Full time

    Site Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...


  • Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...


  • Chennai, Tamil Nadu, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...


  • Chennai, Tamil Nadu, India ACV Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    ACV's mission is to build and enable the most trusted and efficient digital marketplaces for buying and selling used vehicles with transparency and comprehensive data that was previously unimaginable. We are powered by a combination of the world's best people and the industry's best technology.  At ACV, we are driven by an entrepreneurial spirit and...


  • Chennai, Tamil Nadu, India Keuro Life Full time ₹ 10,00,000 - ₹ 25,00,000 per year

    Site Reliability Engineer / DevOps We are seeking an experienced Site Reliability Engineer / DevOps professional with a minimum of 6 years in the industry. The ideal candidate will be adept at managing large-scale, high-traffic production environments and ensuring their reliability. Key Responsibilities : - Manage and optimize production environments...


  • Chennai, Tamil Nadu, India Trimble Full time

    Site Reliability Engineer II Your Title: Site Reliability Engineer -II Job Location: Chennai, India Our Department: Trimble Platform Are you interested in cutting edge cloud technologies, ready to dirt your hands in the cloud world? Do you like to be part of a core team with industry leading site reliability engineering standards? About the...


  • Chennai, Tamil Nadu, India Parkar Digital Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    About Parkar:We love building software products. With a decade of experience and a global presence across four countries, we've established ourselves as a trusted partner for over 100 organizations, helping them leverage technology to drive transformative growth. Staying at the forefront of technological advancements, we actively explore and integrate the...


  • Chennai, Tamil Nadu, India Talent Worx Full time ₹ 1,20,000 - ₹ 3,00,000 per year

    EXP required - 5 to 8 years.Role and ResponsibilitiesReporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business.  In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey,...