Site Reliability Engineer
3 days ago
Job Description Siemens Digital Industries Software is a leading provider of solutions for the design, simulation, and manufacture of products across many different industries. Formula 1 cars, skyscrapers, ships, space exploration vehicles, and many of the objects we see in our daily lives are being conceived and manufactured using our Product Lifecycle Management (PLM) software. About The Role We are looking for a Site Reliability Engineer (SRE) to strengthen the reliability, observability, and operational excellence of our Digital Manufacturing SaaS portfolio. As an SRE, you'll work in a central reliability team supporting multiple product startups, ensuring their services are operationally ready, observable, and resilient at scale. You'll collaborate closely with R&D and DevOps teams to enhance system reliability, automate operations, and build visibility across complex multi-tenant and single-tenant cloud environments. Key Responsibilities Incident Response & Reliability - Serve as the first line of response for production incidents across multiple SaaS products. - Drive root cause analysis, corrective actions, and post-incident reviews to prevent recurrence. - Continuously improve mean time to detect (MTTD), mean time to acknowledge (MTTA), and mean time to resolve (MTTR). - Demonstrate strong understanding of SRE principles including Toil reduction, Error Budgets, and incident response maturity. - Apply effective techniques to manage outages and service degradations, ensuring transparent communication and timely recovery. - Participate in on-call rotations and proactively identify opportunities for incident pattern reduction and reliability automation. Observability & Monitoring - Implement and enhance observability through Datadog, AWS CloudWatch, and custom telemetry pipelines. - Define and maintain SLIs, SLOs, and Error Budgets for proactive health tracking and service quality governance. - Continuously improve monitoring coverage and alert signal-to-noise ratio, tuning alerts to reduce false positives and noise fatigue. - Develop actionable dashboards and reports to enhance situational awareness and promote a metrics-driven reliability culture. Operational Readiness - Evaluate new product releases for operational readiness, ensuring runbooks, alerts, and dashboards are in place. - Contribute to standardized operational models and playbooks for startup teams across the Digital Manufacturing segment. - Partner with engineering and product teams to embed operational excellence practices early in the development lifecycle. Automation & Infrastructure - Develop and maintain Infrastructure as Code (IaC) and automation scripts using Terraform, Python, or Bash to improve deployment consistency and reduce manual toil. - Contribute to continuous improvement of CI/CD pipelines and environment configurations. - Build automation that supports self-healing, consistency, and scalability across distributed services. Collaboration & Enablement - Collaborate with product R&D teams to guide them on operational best practices and reliability design. - Work with global teams across time zones (India, US, and Europe) ensuring smooth handoffs and incident transparency. - Actively participate in sprint planning, retrospectives, and incident reviews under Agile delivery models. - Promote a reliability-first mindset across teams through technical enablement, documentation, and continuous improvement. What You'll Bring - 34 years of experience in Site Reliability Engineering, DevOps, or Production Support for large-scale SaaS systems. - Solid understanding of Linux fundamentals, networking, and cloud infrastructure (AWS preferred). - Hands-on experience with Datadog, CloudWatch, PagerDuty, or similar monitoring/alerting platforms. - Proficiency in scripting (Python, Bash, or PowerShell) for automation and tooling. - Experience with IaC tools like Terraform or CloudFormation. - Working knowledge of CI/CD tools (Jenkins, GitLab CI, GitHub Actions). - Familiarity with containers and orchestration (Docker, Kubernetes). - Understanding of SRE concepts such as SLI/SLO/Error Budget, Toil management, alert noise reduction, and incident lifecycle management. - Strong problem-solving skills, ownership mindset, and ability to thrive in cross-functional collaborations. Preferred Certifications Having the following certifications will be an advantage: - AWS Certified Solutions Architect Associate - Certified Kubernetes Administrator (CKA) - HashiCorp Certified: Terraform Associate Nice to Have - Exposure to multi-tenant and single-tenant SaaS architectures. - Experience performing postmortems and driving reliability improvements through metrics. - Familiarity with Agile/Scrum methodologies and global collaboration practices. We are Siemens A collection of over 377,000 minds building the future, one day at a time in over 200 countries. We're dedicated to equality, and we welcome applications that reflect the diversity of the communities we work in. All employment decisions at Siemens are based on qualifications, merit, and business need. Bring your curiosity and creativity and help us shape tomorrow We offer a comprehensive reward package which includes a competitive basic salary, bonus scheme, generous holiday allowance, pension, and private healthcare. Siemens Software. Transform the Everyday with Us Please note that, due to the current integration framework, this opportunity is currently available exclusively to employees of Altair and DISW. While there is a possibility that the position may be made available to all Siemens employees through a future external posting, this is not guaranteed. We appreciate your understanding and cooperation during this transitional period. This communication does not constitute a promise or guarantee of future employment opportunities beyond the current scope.
-
Site Reliability Engineer
3 hours ago
Pune, India UBS Full timeJob Description Job Reference # 326131BR Job Type Full Time Your role We are seeking a highly experienced Site Reliability Engineer (SRE) to join our technology team in a mission-critical financial environment. This role is ideal for someone who has a proven track record of building and operating reliable, scalable systems in regulated industries such as...
-
Site Reliability Engineer
1 week ago
Pune, India emagine Full timeJob Description Job Overview: As a Site Reliability Engineer (SRE) working in a 24/7 shift rotation, you will be responsible for ensuring the reliability, availability, and performance of critical systems and services. You will combine strong technical skills with operational excellence to proactively monitor, troubleshoot, and resolve issues. Your expertise...
-
Site Reliability Engineer
3 days ago
Pune, India NR Consulting Full timeJob Description ```html About the Company We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and CI/CD automation to lead cloud infrastructure initiatives. The ideal candidate will design and implement robust CI/CD pipelines, automate deployments, ensure platform reliability, and drive...
-
Site Reliability Engineer
2 weeks ago
pune, India Talent Worx Full timeSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
1 week ago
Pune, India Talent Worx Full timeSite Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Relanto Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Site Reliability EngineerSummaryWe are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 4 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications.Roles And Responsibilities4 years of...
-
Site Reliability Engineer
2 weeks ago
India Grootan Technologies Full timeAbout the Role We are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...
-
Site Reliability Engineer
5 days ago
India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – AWS Experience: 8+ years Location: Chennai / Mumbai Work Mode: Hybrid Key Skills: AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experience and a solid background in DevOps, automation, observability, and...
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India Fiserv Full time ₹ 8,00,000 - ₹ 24,00,000 per yearSite Reliability EngineerExp. Range-8 to14 YearsWhat does a successful Site Reliability Engineer (SRE) Expert do at Fiserv?The Site reliability engineer blends the principles of software engineering with the discipline of operations to create high-performing and reliable software systems. They are tasked with designing and implementing tools, processes, and...
-
Site Reliability Engineer
3 weeks ago
India Akamai Technologies Full timeJob Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...