Site Reliability Engineer
15 hours ago
Job Description Siemens Digital Industries Software is a leading provider of solutions for the design, simulation, and manufacture of products across many different industries. Formula 1 cars, skyscrapers, ships, space exploration vehicles, and many of the objects we see in our daily lives are being conceived and manufactured using our Product Lifecycle Management (PLM) software. The DISW SRE organization is dedicated to enhancing service and application availability, optimizing processes by automating manual and repetitive tasks, and addressing complex technical challenges in a dynamic, collaborative, inclusive, and iterative environment. This position plays a crucial role in developing automated solutions and processes that support and sustain best-in-class cloud-based applications. The candidate will support the Siemens Xcelerator platform and will be for coordinating major incident response, maintaining partner communication during service-impacting events, and facilitating resolution in compliance with service level agreement (SLA). Strong communication & coordination skills are necessary to support core objectives. This roles success will be defined by product teams within DISW business units meeting their SLAs. Key Responsibilities - Incident Management: Act as the primary point of contact and leader during major incidents, coordinating the response, communication, and resolution efforts across all involved teams. - Incident Response: Quickly assess the severity of incidents, determine the impact, and drive the appropriate response to restore services as quickly as possible. - Communication: Ensure clear, concise, and timely communication with stakeholders, including technical teams, management, and customers, throughout the incident lifecycle. - Post-Incident Analysis: Lead post-incident reviews to identify root causes, drive improvements, and implement preventive measures to reduce the likelihood of recurrence. - Collaboration: Work closely with SRE, DevOps, Development, and other relevant teams to ensure that incident management processes are well-defined and continuously improved. - Training & Preparedness: Conduct regular incident response drills, train teams on incident management processes, and ensure readiness for handling high-severity incidents. - Documentation: Maintain and update incident management documentation, ensuring that all procedures are up-to-date and accessible to all relevant teams. - Monitoring & Alerts: Collaborate with SRE and monitoring teams to define and refine alerting criteria, ensuring that incidents are detected and escalated promptly. - Continuous Improvement: Find opportunities to improve system reliability, scalability, and performance based on lessons learned from incidents. - 24x7 On-call rotation: Participate in 24x7 on-call rotation. Qualifications: - Technical Skills: Familiar with cloud infrastructure (AWS, GCP, Azure), containerization (Docker, Kubernetes) - Certifications: Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) are a plus. - Automation: Experience with automation tools and scripting languages (e.g., Python, Bash) to streamline incident response and remediation. - Stakeholder Management: Experience aligning with cross-functional teams including business and product stakeholders during and after incidents. - Metrics Ownership: Ability to define and track incident-related critical metrics (e.g., MTTR, MTTD) to drive accountability and improvement. - Experience: Enterprise IT environment with distributed environments - Communication: Outstanding English communication skills, both verbal and written, as well as, listening and synthesis skills. - Incident Response: Quickly assess the severity of incidents, determine the impact, and drive the appropriate response to restore services as quickly as possible. - Problem-Solving: Excellent troubleshooting and problem-solving skills, with the ability to quickly analyze complex systems. - Calm Under Pressure: Ability to remain calm, focused, and effective in high-pressure situations. The ability to make quick, confident decisions. - Leadership: Demonstrated experience in leading incident response efforts and managing cross-functional teams during critical situations. - Technical Skills: Familiar with Jira Service management (or equivalent i.e. ServiceNow), Datadog (or equivalent i.e. Grafana), PagerDuty (or equivalent), Atlassian Status page (or equivalent). - Driven Learner: Highly motivated and driven to learn new technologies, skills, and methodologies, continuously seeking to expand your knowledge and adapt to evolving industry trends. - Must be willing and available to work the core hours required A collection of over 377,000 minds building the future, one day at a time in over 200 countries. We're dedicated to equality, and we welcome applications that reflect the diversity of the communities we work in. All employment decisions at Siemens are based on qualifications, merit, and business need. Bring your curiosity and creativity and help us shape tomorrow We offer a comprehensive reward package which includes a competitive basic salary, bonus scheme, generous holiday allowance, pension, and private healthcare. Transform the everyday Accelerate transformation #SWSaaS
-
Site Reliability Engineer
2 weeks ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Site Reliability Engineer
1 week ago
India - Pune Northern Trust Full time ₹ 20,00,000 - ₹ 60,00,000 per yearPrincipal Infrastructure Services (SRE) About Northern Trust: Northern Trust, a Fortune 500 company, is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. Northern Trust is proud to provide innovative financial services and guidance to the world's most successful individuals, families, and...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Fiserv Full time ₹ 8,00,000 - ₹ 24,00,000 per yearSite Reliability EngineerExp. Range-8 to14 YearsWhat does a successful Site Reliability Engineer (SRE) Expert do at Fiserv?The Site reliability engineer blends the principles of software engineering with the discipline of operations to create high-performing and reliable software systems. They are tasked with designing and implementing tools, processes, and...
-
Site Reliability Engineer
3 weeks ago
, India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
-
Site Reliability Engineer
4 weeks ago
Pune, India TechVerito Full timeJob Description About the Role: 3-5 years of proven and progressive experience as an SRE or DevOps Engineer. As a SRE Engineer, you will have a strong background in cloud infrastructure management, migration and deployment, with expertise in Google Cloud Platform (GCP), DevOps tools, and Kubernetes ecosystem. The primary focus of this role will be to migrate...
-
Site Reliability Engineer
4 days ago
India Akamai Technologies Full timeJob Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...
-
Site Reliability Engineer
7 days ago
Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per yearCompany DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...
-
Site Reliability Engineer
1 day ago
Pune, Maharashtra, India Growel Softech Pvt. Ltd. Full time ₹ 12,96,000 - ₹ 1,51,20,000 per yearJob TitleSite Reliability EngineerLocationPune (Hybrid - 3days in a week at office, 2 days wfh, Candidate needs toreport to only Pune office) (Relocation is considerable)Shift Timings12:30 PM - 9:30 PM ISTBudget - 10+ to 12+ yrs 31 LPA13 to 15+ yrs 36 LPAInterview2 rounds (HMs availability is between 3PM 5PM IST)Positions4Considerable Notice Period - 30...
-
Site Reliability Engineer
2 days ago
India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per yearDo you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Ather Energy Full time ₹ 6,00,000 - ₹ 18,00,000 per yearYou'll be our: Site Reliability EngineerYou'll be based at: Pune Zonal OfficeYou'll be aligned with: Cloud and Data Platform Lead / Cloud ArchitectYou'll be a member of: Cloud and Data Platform TeamAther's fleet of smart scooters is growing rapidly, and so is the volume of data they generate. Our Vehicle Data Platform (VDP) is the core of this ecosystem, and...