Sr Site Reliability Engineer

4 days ago

Hyderabad, India F5 Full time

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. F5 leads the market in building products to make every app run faster, smarter, and safer anywhere. To support our growing business, we need to expand our organization by creating a new team in India specializing in site reliability for our NGINX services. In this role, successful candidate will work closely with their global team that brings software engineering and automated solution mindset to work. The Senior Site Reliability Engineer will be responsible for ensuring the reliability, availability, and scalability of critical NGINX systems and SaaS platforms. Systems under the care of a Senior Site Reliability Engineer must operate effectively and reliably through scalable builds and deployments, frequent releases, and complex architectures that encompass modern technologies. You will work closely with technical and non-technical teams throughout the organization to facilitate the design and implementation of scalable solutions, drive automation initiatives, and monitor and maintain the performance of critical NGINX systems. We are looking for someone who has: Experience solving problems related to large-scale distributed systems; is able to take complex problems and identify potential solutions, knowns, and unknowns. Works to drive continuous improvement and efficiency. Ability to write code in multiple languages, choosing the right strongly or dynamically typed language for the job. Responsibilities: Leads a project team, providing direction, issue resolution, and mentorship, as well as regular progress updates and reporting. Solve problems in mission critical services; implement solutions to prevent recurrence; lead Retrospectives to explore and understand root causes, define next steps to avoid future incidents, and document and report findings. Help shape SRE strategies by evaluating and contributing to product/service design. Participate in system design meetings, capacity planning, launch reviews, etc. to ensure support services/platforms are as efficient as possible before going live. Scale systems sustainably through mechanisms such as automation and evolve systems by fostering changes that improve reliability and velocity. Enhance data-driven engineering culture by providing statistical trends and analysis using real service data to increase service health and quality. Knowledge, Skills, and Experience Bachelor's (or higher level) degree in one or more of these disciplines: Computer Science, Computer Engineering, or related fields. 7+ years of professional experience in software engineering Experience setting up and using incident and on-call management systems. Experience setting up and building tools to collect and visualize data (logs, metrics, alerts), building dashboards, alerting, and monitoring systems. Experience with deploying secure infrastructure and services in one or more cloud environments such as AWS or Azure. Experience with configuration management and deployment automation tools, such as Terraform, Ansible, Packer, etc. Proficiency in scripting languages such as Python and Bash. Experience with container (Docker) and orchestration systems (Kubernetes). Solid understanding of Linux OS + systems administration skills Excellent analytical and trouble-shooting skills. Dynamic collaborator who thrives in diverse, geographically distributed locales. Team player that demonstrates diplomacy, promotion of sound ideas & concepts, paired with the desire to help others grow their skills. Strong verbal and written communication skills. Experience with NGINX technologies a strong plus. Fundamental competencies: SYSTEM EXPERIENCE Application Build and Deployment Processes (git*, automation pipelines, Infrastructure as code, etc.) Automated Application Delivery (load balancers, container orchestration, service mesh, High Availability architectures, Frontend, Backend technologies including database, etc.) Service Operation (Define, instrument, measure, and manage service level objectives. Experience with observability tooling including logging infrastructure, time series metrics databases, tracing systems, alert definitions, etc.) Incident management (service restoration, root cause analysis, postmortem authorship, define roles and responsibilities, etc.) Security awareness and competencies, including security as code. Configuration management OBSERVABILITY Explores beyond the obvious to ensure Service Level Objectives (SLO) are met. Understands and measures system behaviors to quickly and efficiently diagnose, identify, and address needs. Proactively test, automate, monitor outputs, leverage signals to infer services and needs. Data management to explore properties, patterns, and distributed tracing SOLUTIONIST Constantly seeking ways to improve systems, making them more efficient and reducing toil. Understands the difference between short-term strategic and long-term fixes Simplifies decisions and judgments by recognizing what to pay attention to and what to ignore; a proficient problem solver. Tenacious and resourceful with an inherent predisposition toward action; unafraid to try something new in the name of innovation. FORWARD THINKING Possess an inherent bias toward innovation, always abreast of developing ideas and technologies. Thoughtfully and strategically considers future needs, opportunities, and advocates positive change. Technological creativity and capacity COMMUNICATION AND COLLABORATION Conveys information, vision, and strategy in an accurate and timely manner, adjusting to ensure understanding based on the audience. Actively listens; seeks to understand rather than respond. Proactively solicits and values diverse perspectives, ideas, and opinions The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

Site Reliability Engineer

2 days ago

Hyderabad, India inTune Systems Inc Full time

Tittle : Sr. SRE/App Support Engineer Location Hyderabad Job Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible...
Site Reliability Engineer

1 day ago

Hyderabad, India inTune Systems Inc Full time

Tittle : Sr. SRE/App Support Engineer Location Hyderabad Job Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible...
Site Reliability Engineer

3 days ago

Hyderabad, India inTune Systems Inc Full time

Tittle : Sr. SRE/App Support EngineerLocation Hyderabad Job Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible...
Site Reliability Engineer

1 day ago

Hyderabad, India inTune Systems Inc Full time

Tittle : Sr. SRE/App Support EngineerLocation HyderabadJob Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible for...
Site Reliability Engineer

2 days ago

hyderabad, India inTune Systems Inc Full time

Tittle : Sr. SRE/App Support EngineerLocation Hyderabad Job Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible...
DevOps Engineer

3 days ago

Hyderabad, India Axceltran digital private limited Full time

Description :Qualifications :- Proven experience as a Site Reliability Engineer, Sr DevOps Engineer, or similar role.- 5 to 7 years of Relevant experience, at least 2 years of experience in Microsoft Azure. Good to have AWS and GCP.- Experience in setting up and managing OTEL, using Loki, Tempo, Promotus, Grafana, Alloy etc.- Experience in creating CI/CD...
Sr Engineer, Site Reliability

2 days ago

Hyderabad, India TMUS Global Solutions Full time

About the Role The Senior Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms. This role is focused on delivering highly reliable financial applications and data services, ensuring they meet the demanding requirements of accuracy, compliance, and...
Sr Engineer, Site Reliability

2 days ago

Hyderabad, India TMUS Global Solutions Full time

About the Role The Senior Systems Reliability Engineer (SRE) ensures the stability, performance, and reliability of IT services and infrastructure. This role combines software engineering and operations expertise to build and maintain highly available, scalable systems. As a leader in DevOps and cloud reliability practices, the engineer supports continuous...
Senior Site Reliability Engineer

3 weeks ago

Hyderabad, India Insight Global, LLC Full time

Job Title : Sr. SREAbout the Company : Insight Globals ClientType : Ongoing EOR, depending on experience levelLocation : ONSITE 4X/WEEK in HITEC City, Hyderabad, INPriority scheduling for candidates who : - Submit resume promptly- Are available for immediate interviews- Connect via LinkedIn with resume and CTC rateRequirements : - Ability to be onsite...
Site Reliability Engineer

2 weeks ago

Hyderabad, India Whatjobs IN C2 Full time

Job Title: Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog | 24/7 Support Department: Site Reliability Engineering Location: Hyderabad, India Employment Type: Full-Time Notice period: 0-15 Days We’re hiring a Site Reliability Engineer to join our SRE team focused on maintaining the performance, reliability, and availability of our fintech...

Americas

Europe

Asia / Oceania

Africa

Sr Site Reliability Engineer