Staff Engineer- SRE
3 days ago
Company DescriptionForbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance, health, business, and everyday life decisions. We do this by providing consumers with the knowledge and research they need to make informed decisions they can feel confident in, so they can get back to doing the things they care about most.If you're looking for challenges and opportunities similar to those of a startup, with the benefits of a seasoned and successful company, then read on:Job DescriptionResponsibilities:The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO's and SLA's.They deploy and manage monitoring tools to gain insights on system health and performance.They analyze performance, identify bottlenecks and implement solutions to improve a system's scalability and latency durations.They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.They forecast resource needs and provision adequately for current and future demand.They design and execute "chaos experiments" to test system's failure resiliency.They own, define and implement the Disaster Recovery (DR) processes for systems.They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.They ensure that security best practices are followed and implemented during design and operations of systems.They also own and maintain documentation of processes, playbooks, and systems.They publish KPI reports and other system health updates on a regular basis to the business.Requirements:Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experienceMust-have - 12+ years of overall IT experienceMust-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position.Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.Must-have - AWS experience - 3+ years' experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes)Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.Experience managing cloud network resources (AWS Preferred) such as CloudWatch,VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points.Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQExperience with configuration automation tools like Puppet/Ansible/Chef/SaltScripting Skills: Strong scripting (e.g. Bash & Python) and automation skills.Operating Systems: Windows and Linux system administration.Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issuesStrong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.Good To Have:Experience with Terraform/Ansible/Chef/PuppetExperience with GitHub ActionsExperience with CloudFront, FastlyOversees team members performing these functionsAnticipates problems and future technical needs and takes necessary steps to address issues.Work primarily in server side technologies and comfortable with client side whenever requiredEnthusiastically follow technology trends, software engineering best practices and technologiesPerks:Day off on the 3rd Friday of every month (one long weekend each month)Monthly Wellness Reimbursement Program to promote health well-beingPaid paternity and maternity leavesQualificationsMust-have - Bachelor's degree, preferably in CS or a related field, or equivalent experienceMust-have - 12+ years of overall IT experienceMust-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.
-
Cloud Sre
2 weeks ago
Chennai, Tamil Nadu, India Ford Motor Company Full timeBe at the Forefront of Mobility's Future: Join Ford as a Site Reliability Engineer! Enterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...
-
Sre - Azure
6 days ago
Chennai, Tamil Nadu, India Tata Consultancy Services Full timeTCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there's nothing that can stop us from growing together. What we are looking for Role: SRE(Azure) Experience Range: 6 -8Years Location: Chennai/Hyderabad/Bangalore/Kolkata Qualifications :BACHELOR OF ENGINEERING
-
SRE Application Support Lead
3 days ago
tamil nadu, India TransUnion Full timeTransUnion's Job Applicant Privacy NoticeWhat We'll Bring:We are seeking a highly skilled and motivated SRE Application Support Lead / Sr. Lead to join our 24x7 support team. This role is critical to ensuring the stability, performance, and reliability of mission-critical applications deployed across modern platforms including Docker, Kubernetes, and cloud...
-
Mainframe Sre
3 weeks ago
Chennai, Tamil Nadu, India Kyndryl Full timeWho We Are At Kyndryl we design build manage and modernize the mission-critical technology systems that the world depends on every day So why work at Kyndryl We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable inclusive world for our employees our customers and our communities The Role Join us as a...
-
Chennai, Tamil Nadu, India Citigroup Full timeOverview of the Role Citi the leading global bank has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions Citi provides consumers corporations governments and institutions with a broad range of financial products and services including consumer banking and credit corporate and investment banking...
-
Cloud SRE Architect
12 hours ago
Teynampet West, Chennai, Tamil Nadu, India Nurtem Full time ₹ 8,00,000 - ₹ 12,00,000 per yearWe are seeking a talented and experienced Cloud SRE / Architect / DevOps Engineer to join our team and play a key role in architecting, deploying, and managing our cloud infrastructure across various platforms. You will be responsible for designing and implementing robust and scalable cloud environments, leveraging your expertise in public clouds (AWS, GCP,...
-
Staff Engineer
7 days ago
tamil nadu, India Codewalla Full timeAbout usCodewalla is a New York–based product studio with engineering teams in India. Since 2005, we’ve built innovative products that scale. We work at the intersection of design, engineering, and AI developing systems shaped by real business needs and tested in the real world. Our team moves fast, thinks deeply, and cares about pushing what software...
-
Mainframe - Sre
1 week ago
Chennai, Tamil Nadu, India FIS Global Full time**Position Type**: Full time **Type Of Hire**: Experienced (relevant combo of work and education) **Mainframe SRE** Are you curious, motivated, and forward-thinking? At FIS you’ll have the opportunity to work on some of the most challenging and relevant issues in financial services and technology. Our talented people empower us, and we believe in being...
-
Site Reliability Engineer
1 day ago
tamil nadu, India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – Azure & AIExperience: 7+ yearsWork Mode: HybridWork Location: Chennai/Mumbai/GurgaonJob Summary:We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AI infrastructure, and automation. The ideal candidate will have a solid background in managing cloud...
-
Staff Engineer
1 week ago
Chennai, Tamil Nadu, India HappyFox Full time ₹ 10,00,000 - ₹ 12,00,000 per yearWe're looking for an experienced Staff Engineer to provide technical leadership to our growing engineering team at HappyFox. You should have prior experience of being responsible for building sufficiently complex products / services for enterprise saas products and mentoring software engineers.What You'll DoAs a Staff Engineer at HappyFox, you will:Lead...