
Site Reliability Engineer
6 days ago
Job Description: SRE, Ansible, and Linux Administrator
Position Title:
Site Reliability Engineer (SRE), Ansible, and Linux Administrator
Role Overview:
We are seeking a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux Administration to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our infrastructure while automating operational tasks and managing Linux-based systems. This role requires a proactive individual who can collaborate across teams to enhance system reliability and operational efficiency.
Key Responsibilities:
Site Reliability Engineering (SRE):
- Design, implement, and maintain highly available and scalable systems to ensure 99.9% uptime.
- Monitor and improve system reliability, performance, and capacity planning.
- Develop and maintain observability tools (e.g., Prometheus, Grafana, ELK Stack) to monitor system health and performance.
- Respond to incidents, troubleshoot issues, and perform root cause analysis to prevent recurrence.
- Automate repetitive operational tasks to reduce manual intervention and improve efficiency.
- Collaborate with development teams to implement DevOps best practices and CI/CD pipelines.
Ansible Automation:
- Develop and maintain Ansible playbooks for configuration management, application deployment, and infrastructure provisioning.
- Automate repetitive tasks such as patch management, system updates, and application deployments.
- Ensure Ansible configurations are version-controlled and follow best practices.
- Troubleshoot and resolve issues with Ansible scripts and deployments.
- Optimize existing Ansible workflows to improve efficiency and reduce execution time.
Linux Administration:
- Manage and maintain Linux-based systems (e.g., RHEL, Ubuntu, CentOS) in production, staging, and development environments.
- Perform system updates, patching, and security hardening to ensure compliance with organizational policies.
- Configure and manage services such as Apache, Nginx, MySQL, PostgreSQL, and Docker.
- Manage user accounts, permissions, and access control on Linux systems.
- Troubleshoot and resolve system-level issues, including performance bottlenecks and hardware failures.
- Implement and maintain backup and disaster recovery solutions for Linux systems.
Required Skills and Qualifications:
Technical Skills:
SRE Expertise:
Strong understanding of SRE principles, including SLAs, SLOs, and error budgets.
- Experience with monitoring tools like Prometheus, Grafana, ELK Stack, or Datadog.
Proficiency in incident management and root cause analysis.
Ansible:
Hands-on experience with Ansible for configuration management and automation.
- Ability to write and debug complex Ansible playbooks and roles.
Knowledge of Ansible Tower/AWX is a plus.
Linux Administration:
Strong experience with Linux systems (RHEL, Ubuntu, CentOS).
- Proficiency in shell scripting (Bash) and familiarity with Python for automation.
Experience with system performance tuning, security hardening, and troubleshooting.
DevOps Tools:
Familiarity with CI/CD tools like Jenkins, GitLab CI, or GitHub Actions.
Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
Networking:
Basic understanding of networking concepts (DNS, load balancing, firewalls, etc.).
Soft Skills:
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
- Ability to work in a fast-paced, dynamic environment.
- Proactive and self-motivated with a focus on continuous improvement.
Preferred Qualifications:
- Experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code tools (Terraform, CloudFormation).
- Knowledge of security best practices for Linux systems and automation tools.
- Certification in Linux (e.g., RHCSA, RHCE) or Ansible (e.g., Red Hat Certified Specialist in Ansible Automation).
Education and Experience:
- Bachelors degree in Computer Science, Information Technology, or a related field.
- 6+ years of experience in SRE, Linux Administration, and Ansible automation.
-
Site Reliability Engineer
3 weeks ago
Pune, India Talent Worx Full timeSite Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
5 days ago
Hyderabad, Pune, India Akshaya Business It Solutions Full time ₹ 8,00,000 - ₹ 25,00,000 per year3+ years of exp in a Site Reliability Engineering, DevOps, or related roleExperience in Python, BashUnderstanding of REST APIs, data serialization (JSON, YAML) and HTTP protocolsProficiency with monitoring & observability tools, Splunk & Dynatrace Required Candidate profileTechnical Skill Requirements:SRE ,Python, Bash/Linux, JSON/YAML, Splunk, Dynatrace,...
-
Site Reliability Engineer
3 weeks ago
Pune, India Talent Worx Full timeSite Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
5 days ago
Pune, Maharashtra, India Talent Worx Full time ₹ 15,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, India Talent Worx Full timeSite Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Specialist - Site Reliability Engineer
24 hours ago
Pune, Maharashtra, India Accelya Group Full time ₹ 20,00,000 - ₹ 25,00,000 per yearFor more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
-
Specialist - Site Reliability Engineer
1 day ago
Pune, Maharashtra, India Accelya Group Full time ₹ 15,00,000 - ₹ 25,00,000 per yearFor more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per yearCompany DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...
-
Site Reliability Engineer
6 days ago
Pune, Maharashtra, India Growel Softech Pvt. Ltd. Full time ₹ 12,96,000 - ₹ 1,51,20,000 per yearJob TitleSite Reliability EngineerLocationPune (Hybrid - 3days in a week at office, 2 days wfh, Candidate needs toreport to only Pune office) (Relocation is considerable)Shift Timings12:30 PM - 9:30 PM ISTBudget - 10+ to 12+ yrs 31 LPA13 to 15+ yrs 36 LPAInterview2 rounds (HMs availability is between 3PM 5PM IST)Positions4Considerable Notice Period - 30...
-
Site reliability engineer
2 weeks ago
Hyderabad, India Talentiser Full timeHiring hybrid Site Reliability Engineers for a fast-growing product company building scalable tech solutions and transforming how businesses run mission-critical operations. Our Saa S platform is designed for high performance, reliability, and automation at scale. Your Impact As a Site Reliability Engineer , you’ll play a key role in ensuring ...