Site Reliability Engineer
5 days ago
Job Description
Job Description:
Ford is seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform.
Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical skills to help create vehicles that are as smart as you are.
As an SRE your role will combine software engineering and systems engineering disciplines to ensure that software systems are available, scalable, and maintainable. This individual will play a pivotal role in shaping the evolving needs of our customers including development of Service Level Indicators and Objectives (SLI/SLO), best practices with associated templates, as well as automation to remove toil and facilitate adoption.
Enable modernization by providing robust SRE standards, monitoring tools powered by AI and easy-to-use dashboards.
Responsibilities
The individual will play a key role in shaping the evolving needs of our ford customers including development of Service Level Indicators and Objectives (SLI/SLO), meet the MTTR/MTTx targets, adopt SRE best practices with associated templates, as well as build automation to remove toil .
The specific responsibilities include :
- Partner with and guide development teams, product managers, Service Teams and other IT professionals in SRE best practices to improve reliability, MTTR/MTTD, quality, and time-to-market of our suite of software solutions across Ford
- Collaborate with development teams as a full-stack software engineer to design, build, and operate scalable and resilient software systems.
- Guide partner teams in setting appropriate SLOs, leveraging distributed tracing, developing effective SRE dashboards and custom metrics etc.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve our resilience as an enterprise
- Identify, reduce, and eliminate TOIL via automation to maximize our partner development teams' time spent on engineering and innovation
- Perform root cause analysis of production incidents and implementing preventive measures
- Enable/guide partner teams to regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, capacity & resource utilization.
- Enable Partner teams to develop resilient back-end, front-end, business logic, data tier and integration tier, along with testing, CICD, monitoring, agile processes, and programming fundamentals.
- Maintain knowledge repository that includes standard operating procedures, SRE best practices & guides, release checklists, etc.
- Provide technical guidance , mentorship to other team members , exhibit leadership and deliver excellence.
Qualifications
Qualifications
:
- Bachelor's degree in computer science, Computer Engineering or related field or a combination of education and equivalent work experience
- 10+ years of Software Engineering experience , development in Python, Java, NoSQL/SQL Datastore, Spring Boot. And 4+ years of experience in SRE.
- 5+ years of experience with any APM and other monitoring tools such as Grafana Cloud, Dynatrace, New Relic, ELK, Splunk, Prometheus, Kafka, DataDog, PagerDuty.
- 3+ years of GCP experience.
- 3+ years of experience maintaining, developing, and supporting multi-tier production applications
- Experience with automated testing, unit/integration/load and/or test-driven development
- Understanding of RESTful APIs, microservices platform, Dynatrace SAAS
- Proficiency in CI/CD ; DevOps / GitOps practices ; Open Telemetry, Chaos Engineering.
- Strong experience with establishing error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.
- Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.
- Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.
- Demonstrable experience as a Site Reliability Engineer.
Additional Preferred Qualifications
- Experience with cloud platforms such as GCP/AWS/Azure
- Familiarity with DevSecOps practices and integrating security into CI/CD pipelines
- Experience with SCA, SAST, DAST, Vulnerability Management, and CSPM tools to assist customers deliver secure services
- SRE Certification(s) ; AI Ops, Kubernetes experience is a plus
- Experience with data visualization tools such as, Alteryx, Tableau, Power BI and Qlik-Sense is good to have.
-
Cloud Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India NatWest Group Full timeSite Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per yearRole Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Motor Full timeSRE - Software Engineer Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJoin us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...
-
Site Reliability Engineer
4 days ago
Chennai, Tamil Nadu, India MNR Solutions Pvt. Ltd. Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescription : Site Reliability Engineer (SRE) Kubernetes & CloudPosition Summary : We are seeking a highly skilled Site Reliability Engineer (SRE) with deep expertise in Kubernetes and cloud technologies (AWS, Azure, or GCP). The SRE will be responsible for designing, deploying, automating, and supporting highly available, scalable, and secure...
-
Site Reliability Engineer III
2 weeks ago
Chennai, Tamil Nadu, India ACV Full time ₹ 1,04,000 - ₹ 1,30,878 per yearACV's mission is to build and enable the most trusted and efficient digital marketplaces for buying and selling used vehicles with transparency and comprehensive data that was previously unimaginable. We are powered by a combination of the world's best people and the industry's best technology. At ACV, we are driven by an entrepreneurial spirit and...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Intellect Design Arena Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Site Reliability EngineerCompany: Intellect Design Arena LtdLocation: Chennai, IndiaExperience Required: 6+ yearsJob Type: Full-timeDepartment: SRE / DevOps / Engineering EnablementAbout Intellect Design Arena LtdIntellect Design Arena Ltd is a global leader in digital financial technology, offering cutting-edge solutions for banking, insurance,...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Trimble Full time ₹ 10,000 - ₹ 25,000 per yearSite Reliability Engineer Cloud Site Reliability Engineer Reporting to: Sr Manager, Availability Management Office Location: Chennai, India Flexible Working: Hybrid (Part Office/Part Home) Cloud Site Reliability Engineer Responsibilities AI in Observability: Heavily utilise migration tooling and AI to eliminate key tasks as well as optimising...