Staff Engineer- SRE
5 hours ago
Company Description
Forbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance, health, business, and everyday life decisions. We do this by providing consumers with the knowledge and research they need to make informed decisions they can feel confident in, so they can get back to doing the things they care about most.
If you're looking for challenges and opportunities similar to those of a startup, with the benefits of a seasoned and successful company, then read on:
Job Description
Responsibilities:
- The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.
- They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.
- They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO's and SLA's.
- They deploy and manage monitoring tools to gain insights on system health and performance.
- They analyze performance, identify bottlenecks and implement solutions to improve a system's scalability and latency durations.
- They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.
- They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.
- They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.
- They forecast resource needs and provision adequately for current and future demand.
- They design and execute "chaos experiments" to test system's failure resiliency.
- They own, define and implement the Disaster Recovery (DR) processes for systems.
- They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.
- They ensure that security best practices are followed and implemented during design and operations of systems.
- They also own and maintain documentation of processes, playbooks, and systems.
- They publish KPI reports and other system health updates on a regular basis to the business.
Requirements:
- Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience
- Must-have - 12+ years of overall IT experience
- Must-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position.
- Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.
- Must-have - AWS experience - 3+ years' experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.
- Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.
- Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes)
- Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.
- Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.
- Experience managing cloud network resources (AWS Preferred) such as CloudWatch,
- VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points.
- Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.
- Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ
- Experience with configuration automation tools like Puppet/Ansible/Chef/Salt
- Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills.
- Operating Systems: Windows and Linux system administration.
- Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues
- Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.
Good To Have:
- Experience with Terraform/Ansible/Chef/Puppet
- Experience with GitHub Actions
- Experience with CloudFront, Fastly
- Oversees team members performing these functions
- Anticipates problems and future technical needs and takes necessary steps to address issues.
- Work primarily in server side technologies and comfortable with client side whenever required
- Enthusiastically follow technology trends, software engineering best practices and technologies
Perks:
- Day off on the 3rd Friday of every month (one long weekend each month)
- Monthly Wellness Reimbursement Program to promote health well-being
- Paid paternity and maternity leaves
Qualifications
- Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience
- Must-have - 12+ years of overall IT experience
- Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.
-
SRE Software Engineer
2 weeks ago
Chennai, Tamil Nadu, India Ford Motor Company Full time ₹ 1,20,000 - ₹ 1,50,000 per yearEnterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical skills to help create...
-
Cloud SRE
1 week ago
Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 9,00,000 - ₹ 12,00,000 per yearBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
SRE & Observability Administrator
2 days ago
Chennai, Tamil Nadu, India SARIKA MARKETING Full time ₹ 5,00,000 - ₹ 15,00,000 per yearWE are hiring for SRE & Observability Administrator.Role DescriptionThis is a full-time, on-site SRE & Observability Administrator position located in Chennai. The role will involve ensuring high availability and reliability of systems, implementing and managing observability solutions, and conducting thorough troubleshooting. The professional will also...
-
Staff Software Engineer
4 days ago
Chennai, Tamil Nadu, India PayPal Full time ₹ 8,00,000 - ₹ 16,00,000 per yearThe CompanyPayPal has been revolutionizing commerce globally for more than 25 years. Creating innovative experiences that make moving money, selling, and shopping simple, personalized, and secure, PayPal empowers consumers and businesses in approximately 200 markets to join and thrive in the global economy. We operate a global, two-sided network at scale...
-
AWS SRE
1 week ago
Chennai, Tamil Nadu, India Ispace Full time ₹ 12,00,000 - ₹ 36,00,000 per yearROLE: AWS SRE / DevOps Engineer - (OpenTofu Expert)Key Responsibilities:Design, implement, and manage reusable, modular Infrastructure as Code (IaC) solutions using Terraform, OpenTofu, Pulumi (JavaScript), AWS CDK, and AWS Amplify.Architect and optimize AWS services across compute, networking, storage, database, and security for enterprise-scale workloads....
-
Staff Software Engineer
2 weeks ago
Chennai, Tamil Nadu, India Trimble Full time ₹ 12,00,000 - ₹ 36,00,000 per yearStaff Software Engineer Job Purpose The primary function of a Staff Software Engineer is to be a technical expert who drives the design, development, and maintenance of complex software solutions across multiple teams and projects. This individual is responsible for setting technical direction, mentoring engineers, and ensuring the quality and...
-
Java Developer Senior with SRE
15 seconds ago
Chennai, Tamil Nadu, India FIS Full time ₹ 8,00,000 - ₹ 24,00,000 per yearPosition Type :Full timeType Of Hire :Experienced (relevant combo of work and education)Education Desired :Bachelor of Computer EngineeringTravel Percentage :0%Java Developer with SRE – 7-10 Yrs - Chennai / Bangalore / Pune LocationAbout the team:We are seeking a highly skilled Site Reliability Engineer (SRE) with deep expertise in Java technologies like...
-
SRE Application Support Lead
2 days ago
Chennai, Tamil Nadu, India TransUnion Full time ₹ 12,00,000 - ₹ 24,00,000 per yearTransUnion's Job Applicant Privacy NoticeWhat We'll Bring:We are seeking a highly skilled and motivated SRE Application Support Lead / Sr. Lead to join our 24x7 support team. This role is critical to ensuring the stability, performance, and reliability of mission-critical applications deployed across modern platforms including Docker, Kubernetes, and cloud...
-
SRE Application Support Lead
4 days ago
Chennai, Tamil Nadu, India TransUnion Full time ₹ 8,00,000 - ₹ 12,00,000 per yearTransUnion's Job Applicant Privacy NoticeWhat We'll Bring:We are seeking a highly skilled and motivated SRE Application Support Lead / Sr. Lead to join our 24x7 support team. This role is critical to ensuring the stability, performance, and reliability of mission-critical applications deployed across modern platforms including Docker, Kubernetes, and cloud...
-
Senior SRE Engineer – Observability
1 week ago
Chennai, Tamil Nadu, India HariNex Solutions Full time ₹ 9,60,000 - ₹ 15,60,000 per yearJob description: Engineer/Senior Engineer – ObservabilityLocation: Chennai (Preferred) /MumbaiRole Type- ContractGrafana Developer Expertise ( Grafana, Prometheus , Splunk) With 2~3 years of ExperienceThe Engineer/Senior Engineer – Observability Engineering is key member of Service Reliability Engineering. He/she will be ultimately responsible for system...