Consultant - Site Reliability Engineer (SRE)
1 week ago
Observability – Dynatrace, Elastic Stack or any other logging tools
Mandatory skills
- Hands-on experience of APM - application monitoring tools (mentioned above).
- At least 5 years of relevant experience in Setting-up monitors, integrate with alerting tools, integration with third party tools via API/REST services,
- Experience to troubleshoot issues, perform RCA, create/maintain dashboards of system/application issues in APM tools.
- Deep understanding of set-up of monitoring tools (installation, tuning, vendor management, patching/upgrade)
Development
- Develop tools/solutions to automate, to standardize deployments and operations in regards to System/application/infrastructure monitoring.
- Write scripts related to setting-up monitors in Application Performance Management tools such as Dynatrace, AppMon etc.
- Develop dashboard within APM tool to be used to research on the application issues.
- Integration with systems over APIs/REST services (Nice to have).
System administration
- Apply software development & engineering mindset to sys admin activities.
- Focus on improving monitoring/observability of systems in a measurable.
Define Service Level Indicators and Objectives
- Provide monitoring services for systems so that teams can begin to track their SLOs and SLIs. Also assist in providing realistic objectives for the future and advise on proper SLAs for customers.
Monitoring
- Set-up/tune monitors for application health/status/performance.
- Improve monitoring based on symptoms instead of outages.
- Recommend, support and train usage of effective monitoring tools and process that can allow real-time system monitoring as well as analysis of long-term reliability trends.
- Create new alerts to find anomalies and understand the root cause of system failures.
- Integrate monitoring tools with third party tools such as XMatters, alerting tools, SMS tools.
Automation
- Document every action to convert findings into repeatable actions and then into automation.
- Discover efficiency by automating things to remove repetitive toil like watching dashboard, executing scripts, and other manual endeavors.
- Help optimize on-call rotation - add automation and context to alerts – leading to better real-time collaborative response from on-call responders.
Manage Incidents
- Understanding and usage of Incident management.
- Assist stakeholders in examining incidents and establishing processes to help prevent or minimize similar problems from arising.
- Develop procedures and policies by which technical support teams will operate.
- These processes will be applied to help in such areas as service failures and security threats. Will also train IT support staff
Facilitate Retrospective
- Provide RCA & solutions.
- Derive trends from recurring system/application issues using dashboard.
- Good Written and Verbal Communication skills
- Highly motivated individual with a positive and proactive attitude to work and willingness to make changes to improve operational efficiency through innovation, process and procedure and adopting and adapting ideas and practices from elsewhere.
- Ready to work in shift, weekends, and flexible schedule.
- Excellent team skills with ability to listen and contribute to discussions and meetings.
- Ability to motivate staff
- Excellent team skills with ability to listen and contribute to discussions and meetings.
- Graduate – Bachelor's in Engineering/Technology degree (preferred)
- Any SRE/Cloud Technology related certifications (Good to have)
-
Site Reliability Engineer
6 days ago
Pune, Maharashtra, India Fiserv Full time ₹ 8,00,000 - ₹ 24,00,000 per yearSite Reliability EngineerExp. Range-8 to14 YearsWhat does a successful Site Reliability Engineer (SRE) Expert do at Fiserv?The Site reliability engineer blends the principles of software engineering with the discipline of operations to create high-performing and reliable software systems. They are tasked with designing and implementing tools, processes, and...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Idox Full time ₹ 9,00,000 - ₹ 12,00,000 per yearSite Reliability Engineer (AWS)Pune, IndiaAbout the roleWe are seeking a driven and detail-oriented Site Reliability Engineer (SRE) with a strong passion for building resilient, scalable cloud infrastructure. This role offers an exciting opportunity for professionals with 2 to 4 years of experience in DevOps, Cloud, or Infrastructure to deepen their...
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India CrelioHealth Full time ₹ 12,00,000 - ₹ 24,00,000 per yearJob Role - Site Reliability EngineerLocation - PuneJob Summary:We are seeking a Senior DevOps & SRE Engineer to join our team and help us build, deploy, and maintain our infrastructure and applications. The ideal candidate will have experience working in a fast-paced environment and a strong background in DevOps and Site Reliability Engineering (SRE). You...
-
Site Reliability Engineer
5 days ago
Pune, Maharashtra, India NR Consulting Full time ₹ 12,00,000 - ₹ 24,00,000 per year```htmlAbout the CompanyWe are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and CI/CD automation to lead cloud infrastructure initiatives. The ideal candidate will design and implement robust CI/CD pipelines, automate deployments, ensure platform reliability, and drive continuous improvement in...
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India UBS Full time ₹ 10,00,000 - ₹ 25,00,000 per yearIndiaInformation Technology (IT)Group FunctionsJob Reference #319274BRCityPuneJob TypeFull TimeYour roleAre you an analytic thinker?Do you enjoy Site Reliability Engineering initiatives and proactive problem management across on-premises & Cloud Database ensuring high availability & stability of Database infrastructure services?Do you want to play a key role...
-
Site Reliability Engineering
6 days ago
Pune, Maharashtra, India Amadeus Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob TitleSite Reliability Engineering (SRE) Manager – iHotelierRole OverviewAs an SRE Manager for iHotelier, you will lead a team responsible for ensuring the availability, scalability, and performance of mission-critical hospitality services. This role combines technical leadership, operational excellence, and strategic planning to deliver a seamless...
-
Site Reliability Engineering
3 days ago
Pune, Maharashtra, India Amadeus Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob TitleSite Reliability Engineering (SRE) Manager – iHotelierRole OverviewAs an SRE Manager for iHotelier, you will lead a team responsible for ensuring the availability, scalability, and performance of mission-critical hospitality services. This role combines technical leadership, operational excellence, and strategic planning to deliver a seamless...
-
Site Reliability Engineer +devops
2 days ago
Pune, Maharashtra, India Reveille Technologies,Inc Full time ₹ 5,00,000 - ₹ 15,00,000 per yearWe're Hiring – Site Reliability Engineer (SRE) | C2H Opportunity Location: [Pune] Type: Contract-to-Hire (C2H) Notice Period: Immediate Joiners Only Experience : 4 to 6 yrsWe're looking for a Site Reliability Engineer (SRE) with solid troubleshooting skills, scripting experience, and hands-on exposure to modern DevOps & monitoring tools. Technical...
-
Senior Site Reliability Engineer
5 days ago
Pune, Maharashtra, India Rosemallow Technologies Pvt Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per yearWe're Hiring : Site Reliability Engineer (SRE)Location : PuneExperience : 6+Role OverviewWe are seeking a highly skilledProduction Environment Engineer / SREto plan, manage, and oversee all aspects of the Production ecosystem. The ideal candidate will ensure high availability, drive automation, enhance monitoring capabilities, and improve...
-
Associate Site Reliability Engineer
4 days ago
Pune, Maharashtra, India InfraCloud Technologies Full time ₹ 40,00,000 - ₹ 1,20,00,000 per yearDuration: 6 Months Internship (with potential full-time conversion based on performance)What are we looking forThis position is for candidates who are eager to build their careers in the Site Reliability Engineering (SRE) domain. We are looking for individuals who are passionate about understanding how systems work, have basic coding or scripting knowledge,...