
Highly Experienced Site Reliability Engineer Lead
2 weeks ago
As a key member of our technology team, you will play a pivotal role in driving the design and implementation of scalable and reliable systems. Your expertise in cloud platforms, Kubernetes, and infrastructure-as-code tools will be instrumental in ensuring the high availability and performance of our critical services.
Working closely with cross-functional teams, you will establish and improve incident management processes, develop automated solutions to reduce manual operational tasks, and enhance system observability through metrics, logging, and distributed tracing tools.
With your strong technical expertise and leadership skills, you will mentor and grow a team of SREs, promote SRE best practices, and foster a culture of reliability and innovation across the organization.
- Reliability & Performance:
- Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
- Proactively identify and resolve performance bottlenecks and system inefficiencies.
- Incident Management & Response:
- Establish and improve incident management processes and on-call rotations.
- Lead incident response and root cause analysis for high-priority outages.
- Drive post-incident reviews and ensure actionable insights are implemented.
- Automation & Tooling:
- Develop and implement automated solutions to reduce manual operational tasks.
- Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
- Optimize CI/CD pipelines for seamless deployments.
- Collaboration:
- Partner with software engineering teams to improve the reliability of applications and infrastructure.
- Work closely with product/engineering teams to design scalable and robust systems.
- Ensure seamless integration of monitoring and alerting systems across teams.
- Leadership & Team Building:
- Manage, mentor, and grow a team of SREs.
- Promote SRE best practices and foster a culture of reliability and performance across the organization.
- Drive performance reviews, skills development, and career progression for team members.
- Capacity Planning & Cost Optimization:
- Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
- Optimize infrastructure and cloud costs while maintaining reliability and performance.
To succeed in this role, you will need to possess the following skills and qualifications:
- Technical Expertise:
- Experience with cloud platforms (AWS/Azure/GCP) and Kubernetes.
- Hands-on knowledge of infrastructure-as-code tools like Terraform/Helm/Ansible.
- Proficiency in Java.
- Expertise in distributed systems, databases, and load balancing.
- Monitoring & Observability:
- Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
- Understanding of metrics-driven approaches for system monitoring and alerting.
- Automation & CI/CD:
- Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
- Skilled in automation frameworks and tools for infrastructure and application deployments.
- Incident Management:
- Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.
By joining our team, you will have the opportunity to work with cutting-edge technologies, influence the evolution of our infrastructure, and lead a high-impact team. You will also have the chance to collaborate with cross-functional teams, drive innovation, and contribute to the growth and success of our organization.
OthersOur ideal candidate is a motivated and experienced professional who is passionate about building and scaling reliable systems. If you are looking for a challenging and rewarding role that will allow you to grow and develop your skills, we encourage you to apply.
-
Site Reliability Engineering Team Lead
1 week ago
Ellore, Andhra Pradesh, India beBeeSystemReliability Full time ₹ 35,00,000 - ₹ 45,00,000About the RoleWe are seeking a seasoned System Reliability Engineer Lead to join our organization. As a key member of our team, you will be responsible for establishing and implementing Site Reliability Engineering practices that ensure high availability, scalability, and performance of critical systems.
-
Site Reliability Engineer
4 days ago
Ellore, Andhra Pradesh, India ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Ellore, Andhra Pradesh, India beBeeTechnical Full time ₹ 20,00,000 - ₹ 25,00,000Senior Technical Manager Job DescriptionWe are seeking a Senior Technical Manager to lead our Site Reliability Engineering team, ensuring operational excellence and fostering a high-performing team culture.This role is responsible for overseeing day-to-day operations, providing technical mentorship, and aligning team efforts with our company goals.Main...
-
Ellore, Andhra Pradesh, India beBeeReliability Full time US$ 80,000 - US$ 1,25,000About Our Team:We are a leading organization in the airline industry, committed to delivering exceptional customer experiences and operational excellence.Our team is responsible for ensuring the reliability and performance of our systems, and we are looking for a skilled professional to join us in this mission.Job Description:We are seeking a highly skilled...
-
Cloud Reliability Specialist
4 days ago
Ellore, Andhra Pradesh, India beBeeEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team.As a Site Reliability Engineer, you will be responsible for ensuring the availability, latency, performance, and efficiency of our cloud-based platform. You will work closely with cross-functional teams to define and enforce reliability standards, lead high-impact...
-
Reliability Engineer Leader
4 days ago
Ellore, Andhra Pradesh, India beBeeSre Full time ₹ 2,50,00,000 - ₹ 3,00,00,000About the RoleWe are looking for a seasoned Vice President of Site Reliability Engineering to lead our team's SRE strategy and culture.
-
Site Reliability Expert
4 days ago
Ellore, Andhra Pradesh, India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 25,00,000Site Reliability ExpertWe are seeking a skilled Site Reliability Engineer to fill this key role.Main Responsibilities:To design and implement scalable infrastructure solutions using DevOps practices and CI/CD pipelines.To develop and maintain monitoring tools ensuring the reliability of our systems.To collaborate with cross-functional teams identifying and...
-
Road Works Site Engineer
2 weeks ago
Ellore, Andhra Pradesh, India beBeeSite Full time ₹ 9,00,000 - ₹ 15,00,000Job Title: Road Works Site EngineerAbout the RoleWe are seeking a skilled and experienced Site Engineer to oversee road works projects in various locations. The successful candidate will be responsible for ensuring quality control, coordinating with civil engineers, and performing structural engineering tasks.Key ResponsibilitiesManage on-site activities and...
-
Highly Available System Reliability Expert
4 days ago
Ellore, Andhra Pradesh, India beBeeSite Full time ₹ 18,00,000 - ₹ 20,00,000Job DescriptionWe are seeking a Senior Site Reliability Engineer to enhance system performance and reliability, automate manual processes, and collaborate with globally dispersed teams.The ideal candidate will provide technical leadership, design and implement solutions to improve platform reliability, build and maintain monitoring systems, and conduct Root...
-
Site Reliability Specialist
2 days ago
Ellore, Andhra Pradesh, India beBeeReliability Full time ₹ 1,20,00,000 - ₹ 1,50,00,000Job TitleWe are seeking a highly skilled Reliability Expert to join our IT team.As a Reliability Expert, you will be responsible for ensuring the reliability and efficiency of our systems. Your primary focus will be on identifying potential system issues early, implementing preventive measures, and boosting system resilience.Some key responsibilities...