Site reliability engineer
1 month ago
As an SRE Engineer, you will be responsible for the Activate and Production Infrastructure.
Your essential duties encompass ensuring the seamless operation and optimal performance of large-scale distributed software applications.
Your role revolves around maintaining a robust and high-performing environment, contributing to the reliability of our services, and innovating solutions to guarantee 24/7 availability.
By leveraging your technical expertise and dedication, you contribute to maintaining a seamless experience for our users while upholding the highest standards of operational excellence.
Your specific responsibilities include: Role and Responsibilities: 1.
Monitoring and Alerting a.
Review existing and set up new monitoring tools and systems as needed to track system performance, key metrics.
2.
Incident Management a.
monitor the alerts and logs to promptly identify incidents or anomalies.
b. Prioritize incidents based on severity and potential impact on stability and reliability.
c. Engage in effective incident resolution, applying necessary fixes and mitigations to restore normal operations.
3.
On-Call Responsibilities a.
Organize on-call schedules to ensure 24/7 coverage for incident response.
b. Respond to alerts, troubleshoot issues, and coordinate with NOC and Engineering teams for incident resolution.
c. Conduct post-incident reviews to identify root causes, learn from incidents, and implement preventive measures.
4.
Automation and Tooling a.
Review pre-existing and build new automation scripts and tools as needed to streamline repetitive tasks, enhance efficiency, and reduce manual errors.
b. Regularly update and maintain tools used for monitoring, deployment, and incident management to align with evolving needs.
5.
Performance Optimization a.
Analyze application performance using profiling and monitoring tools to identify bottlenecks and areas for improvement.
b. Work on optimizations, infrastructure upgrades, and architectural improvements to enhance system performance and efficiency.
6.
Capacity Planning and Scaling a.
Monitor resource utilization and trends to predict capacity needs and plan for scaling.
b. Scale resources, such as servers and databases, are based on usage patterns and anticipated growth to maintain performance and reliability.
Also, automate the entire sizing process.
7.
Disaster Recovery and Redundancy a.
Develop and maintain disaster recovery plans and procedures to ensure business continuity in case of failures or disasters.
b. Implement redundancy and failover strategies to minimize downtime and maintain service availability during failures.
8.
Knowledge Sharing and Documentation a.
Create and maintain comprehensive documentation for configurations, procedures, incidents, and best practices.
b. Foster a culture of knowledge sharing within the team, conducting regular knowledge-sharing sessions and training programs.
9.
Feedback Loop and Continuous Improvement a.
Collect feedback from incidents, post-mortems, and NOC/Dev team interactions to identify areas for improvement.
b. Continuously iterate on processes, tools, and systems based on feedback and lessons learned to drive continuous improvement.
10. Collaboration and Communication a.
Collaborate closely with Engineering and DC/NOC teams to align goals and priorities.
b. Ensure open and transparent communication within the team and with stakeholders, providing regular updates on incidents, progress, and initiatives.
Required Skills and Qualifications Bachelor's degree in computer science or related disciplines Total 3+ years' experience in software application/product support Ability to program using programming languages like Go, Scripting languages like Shell or Python Good to have prior experience in technical engineering A proactive approach to identify the problems, performance bottlenecks, and areas of improvement Must know, Networking, Database (My SQL) and Linux System concepts, Debugging and analyzing the core dumps Hands-on experience with monitoring and observability tools like Grafana, Nagios, Influx, ELK, etc.
Familiarity with orchestration tools like Docker and Grafana and incident management systems like Zenduty Excellent communication and collaboration skills, with the ability to work effectively across teams.
Self-motivated and positive mindset to examine any incidents
-
Site reliability engineer
2 days ago
Pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees.Currently hiring for a new Product Development Center of Excellence in Pune. As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
Pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees.Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees. Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
Pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees. Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees.Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
Pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees.Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
1 month ago
Pune, Maharashtra, India PubMatic Full timeJob Title: Site Reliability EngineerPubMatic, a leading technology company, is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the seamless operation and optimal performance of our large-scale distributed software applications.Key Responsibilities:Monitor and analyze...
-
Site Reliability Engineer
3 weeks ago
Pune, India Gateway Search Full timeHiring for a MNC client which provides software as a service products related to customer support, sales, and other customer communications. The company was founded in Denmark in 2007. It has over 100,000 customers and 5000+ global employees.Currently hiring for a new Product Development Center of Excellence in Pune.As an early hire, you will have a unique...
-
Site Reliability Engineer
3 weeks ago
Pune, India Tata Consultancy Services Full timeDear Candidate, Greetings from TCS !!! TCS is hiring for SRE, please find the below JD….. Experience range – 5+ years Location- Bangalore, Pune, Hyderabad, Chennai Skills Required - Site Reliability Engineer Role& Responsibilities – Collaborates with cloud platform engineers and teams to design, develop, test, and implement...
-
Site reliability engineer
20 hours ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability EngineerLocation :: Pune India (Hybrid, 2-3 days in a week onsite)Job Description:We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around monitoring,...
-
Site Reliability Engineer
3 days ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability EngineerLocation :: Pune India (Hybrid, 2-3 days in a week onsite)Job Description:We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around monitoring,...
-
Site Reliability Engineer
3 weeks ago
Pune, India Tata Consultancy Services Full timeDear Candidate,Greetings from TCS !!!TCS is hiring for SRE, please find the below JD…..Experience range – 5+ yearsLocation- Bangalore, Pune, Hyderabad, ChennaiSkills Required - Site Reliability EngineerRole& Responsibilities –Collaborates with cloud platform engineers and teams to design, develop, test, and implement availability, reliability,...
-
Site Reliability Engineer
2 days ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability EngineerLocation :: Pune India (Hybrid, 2-3 days in a week onsite)Job Description: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around monitoring,...
-
Site Reliability Engineer
2 days ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability Engineer Location :: Pune India (Hybrid, 2-3 days in a week onsite) Job Description: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around...
-
Site Reliability Engineer
2 days ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability Engineer Location :: Pune India (Hybrid, 2-3 days in a week onsite) Job Description: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around...
-
Site Reliability Engineer
2 days ago
Pune, India Collabera Digital Full timeJob Title :: Site Reliability EngineerLocation :: Pune India (Hybrid, 2-3 days in a week onsite)Job Description: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on building and maintaining highly reliable, scalable, and efficient systems. You will implement best practices around monitoring,...
-
Site Reliability Engineer
4 weeks ago
Pune, Maharashtra, India Coupa Software Full timeAbout the RoleCoupa Software is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in building and maintaining the technologies on our Coupa Cloud platform.ResponsibilitiesDesign and develop scalable, reliable, and secure cloud-based systemsWork closely with cross-functional teams to...
-
Site Reliability Engineer
4 weeks ago
Pune, Maharashtra, India Roche Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Roche. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining site reliability engineering practices that ensure the reliability and performance of our production systems.Key ResponsibilitiesDesign and implement SRE...
-
Site Reliability Engineer
1 month ago
Pune, Maharashtra, India F337 Deutsche India Private Limited, Pune Branch Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Deutsche Bank's Corporate Bank division. As a key member of our agile delivery team, you will play a pivotal role in ensuring the reliability, scalability, and performance of our systems.Your Key ResponsibilitiesDesign, build, and maintain robust and efficient...
-
Site Reliability Engineer
2 weeks ago
pune, India Tata Consultancy Services Full timeTCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: Site Reliability Engineer Experience Range: 8 – 12 Years Location: Bangalore/Chennai/Pune/Delhi Must Have: Essential: - Exceptional...