Site Reliability Engineer
4 days ago
Job Description Job Description Job Description: Ford is seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform. Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical skills to help create vehicles that are as smart as you are. As an SRE your role will combine software engineering and systems engineering disciplines to ensure that software systems are available, scalable, and maintainable. This individual will play a pivotal role in shaping the evolving needs of our customers including development of Service Level Indicators and Objectives (SLI/SLO), best practices with associated templates, as well as automation to remove toil and facilitate adoption. Enable modernization by providing robust SRE standards, monitoring tools powered by AI and easy-to-use dashboards. Responsibilities The individual will play a key role in shaping the evolving needs of our ford customers including development of Service Level Indicators and Objectives (SLI/SLO), meet the MTTR/MTTx targets, adopt SRE best practices with associated templates, as well as build automation to remove toil . The specific responsibilities include : - Partner with and guide development teams, product managers, Service Teams and other IT professionals in SRE best practices to improve reliability, MTTR/MTTD, quality, and time-to-market of our suite of software solutions across Ford - Collaborate with development teams as a full-stack software engineer to design, build, and operate scalable and resilient software systems. - Guide partner teams in setting appropriate SLOs, leveraging distributed tracing, developing effective SRE dashboards and custom metrics etc. - Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve our resilience as an enterprise - Identify, reduce, and eliminate TOIL via automation to maximize our partner development teams time spent on engineering and innovation - Perform root cause analysis of production incidents and implementing preventive measures - Enable/guide partner teams to regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, capacity & resource utilization. - Enable Partner teams to develop resilient back-end, front-end, business logic, data tier and integration tier, along with testing, CICD, monitoring, agile processes, and programming fundamentals. - Maintain knowledge repository that includes standard operating procedures, SRE best practices & guides, release checklists, etc. - Provide technical guidance , mentorship to other team members , exhibit leadership and deliver excellence. Qualifications Qualifications: - Bachelor's degree in computer science, Computer Engineering or related field or a combination of education and equivalent work experience - 10+ years of Software Engineering experience , development in Python, Java, NoSQL/SQL Datastore, Spring Boot. And 4+ years of experience in SRE. - 5+ years of experience with any APM and other monitoring tools such as Grafana Cloud, Dynatrace, New Relic, ELK, Splunk, Prometheus, Kafka, DataDog, PagerDuty. - 3+ years of GCP experience. - 3+ years of experience maintaining, developing, and supporting multi-tier production applications - Experience with automated testing, unit/integration/load and/or test-driven development - Understanding of RESTful APIs, microservices platform, Dynatrace SAAS - Proficiency in CI/CD ; DevOps / GitOps practices ; Open Telemetry, Chaos Engineering. - Strong experience with establishing error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime. - Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc. - Strong background in software development and systems administration, as well as excellent problem-solving and communication skills. - Demonstrable experience as a Site Reliability Engineer. Additional Preferred Qualifications - Experience with cloud platforms such as GCP/AWS/Azure - Familiarity with DevSecOps practices and integrating security into CI/CD pipelines - Experience with SCA, SAST, DAST, Vulnerability Management, and CSPM tools to assist customers deliver secure services - SRE Certification(s) ; AI Ops, Kubernetes experience is a plus - Experience with data visualization tools such as, Alteryx, Tableau, Power BI and Qlik-Sense is good to have.
-
Cloud Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
Site Reliability Engineer
1 week ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Site Reliability Engineer
3 weeks ago
, India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India NatWest Group Full timeSite Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...
-
Site Reliability Engineer
20 hours ago
Chennai, India Elgebra Full timeRole Overview :We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Site Reliability Engineer
2 weeks ago
India Akamai Full time ₹ 5,00,000 - ₹ 15,00,000 per yearDo you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India NatWest Group Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll enjoy significant...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per yearRole Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Motor Full timeSRE - Software Engineer Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical...