SRE (Site Reliability Engineer)

4 weeks ago


Pune, India Apex One Full time

Job Overview

We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any relevant issues.

Roles And ResponsibilitiesSystem Monitoring and Incident Response: for implementing monitoring solutions to track system health,

performance, and availability. They proactively monitor systems, identify issues, and respond to incidents

promptly, working to minimize downtime and mitigate impacts.

Post-Incident Analysis: Led incident response efforts, coordinated with cross-functional teams, and

conducted post-incident analysis to identify root causes and implement preventive measures.

Continuous Improvement and Reliability Engineering: SREs drive continuous improvement efforts by

identifying areas for enhancement, implementing best practices, and fostering a culture of reliability

engineering. They participate in post-mortems, conduct blameless retrospectives, and drive initiatives to

improve system reliability, stability, and maintainability.

Collaboration and Knowledge Sharing: SREs collaborate closely with software engineers, operations teams,

and other stakeholders to ensure smooth coordination and effective communication. They share knowledge,

provide technical guidance, and contribute to the development of a strong engineering culture.

Support and maintain configuration management for various applications and systems

Implement comprehensive service monitoring, including dashboards, metrics, and alerts

Define, measure, and meet key service level objectives, such as uptime, performance, incidents, and chronic

problems

Partner with application and business stakeholders to ensure high quality product development and release

Collaborate with the development team to enhance system reliability and performance.

QualificationsBachelors degree in Information Technology, Computer Science, or related field.

Strong knowledge of software development processes and procedures.

Strong problem-solving abilities.

Excellent understanding of computer systems, servers, and network systems.

Ability to work under pressure and manage multiple tasks simultaneously.

Strong communication and interpersonal skills.

Strong knowledge of coding languages like Python, Java, Go, etc.

Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++,

Ruby, and JavaScript

Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic

resource management frameworks (Apache Mesos, Kubernetes,Yarn)

Job DescriptionExperience with cloud computing platforms such as AWS, Azure, or Google Cloud

Experience with DevOps tools such as Git, Jenkins, Ansible, Terraform, Docker, etc.

Experience with monitoring tools such as Splunk, Prometheus

Skills: problem solving,post-incident analysis,aws,monitoring tools,cloud computing,key service level objectives,reliability engineering,configuration management,devops practices,coding languages,monitoring tools (splunk, prometheus),continuous improvement,site reliability engineering,service monitoring,incident response,reliability,software development processes,system monitoring,splunk,devops tools (git, jenkins, ansible, terraform, docker),kubernetes,cloud computing (aws, azure, google cloud),devops,ansible,programming (python, java, go, c/c++, ruby, javascript),prometheus,cloud infrastructure,monitoring servicesKeywordscloud computing,splunk,prometheus,software development processes,system monitoring,devops tools,git,jenkins,ansible,terraform,docker,python,java,go,c/c++,ruby,javascript,Site Reliability Engineering*Mandatory Key Skillscloud computing,splunk,prometheus,software development processes,system monitoring,devops tools,git,jenkins,ansible,terraform,docker,python,java,go,c/c++,ruby,javascript,Site Reliability Engineering*



  • Pune, Maharashtra, India Apex One Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Job Overview We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any...


  • Pune, India GfK Full time

    Description About You You are a DevOps or Site Reliability Engineer with a passion for cloud infrastructure and automation. You’re a self-starter and you love keeping up to date with the latest developments in cloud, configuration management and container technologies. You understand the benefits of an immutable infrastructure and you enjoy enabling...


  • Pune, India ENGEL Full time

    Company Description ENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...


  • Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Company DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...


  • Pune, India Talent Worx Full time

    Site Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Pune, India GSPANN Full time

    Description GSPANN is hiring a Site Reliability Engineer (SRE) for its Pune or Hyderabad location. This full-time role focuses on enhancing the reliability of global eCommerce platforms through automation, observability, and cloud-native tools like Azure, Kubernetes, and Terraform.Role and Responsibilities Use monitoring tools such as Dynatrace, Splunk,...


  • Pune, India Talent Worx Full time

    Site Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Pune, India Deutsche Bank Full time

    Job Description Site Reliability Engineering (SRE) Lead, VP Position Overview Job Title: Site Reliability Engineering (SRE) Lead Corporate Title: Vice President Location: Pune, India Role Description We are seeking an experienced and highly capableSite Reliability Engineering (SRE) Leadto support theRates & Creditand broaderFixed Income...


  • Gurugram, Pune, India Prerna Malhotra (Proprietor Of Praxis Hr Solutions) Full time

    Job Description Description We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team in India. The SRE will be responsible for ensuring the reliability, availability, and performance of our applications and services. This role requires a combination of software engineering and systems engineering to build and maintain scalable and...


  • Pune, Maharashtra, India Spark Tech Wave Innovation Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    An experienced Site Reliability Engineer (SRE) / DevOps Engineer with strong experience in cloud infrastructure, automation, and CI/CD. The candidate will be responsible for improving the reliability, scalability, and performance of production systems while driving automation and monitoring initiatives across the environment.Key ResponsibilitiesDesign,...