▷ (Apply in 3 Minutes) Site Reliability Engineer

21 hours ago


Hyderabad, India Talentiser Full time

YOUR IMPACT: Reliability, Automation, and Observability As a hybrid Site Reliability Engineer/DevOps Engineer, you'll be a key driver in ensuring the stability, performance, and scalability of our mission-critical SaaS platform. You'll apply engineering principles to operational challenges, constantly striving to eliminate toil through automation. Operational Excellence & Reliability ● Provide day-to-day management of system alerts, check system health, and escalate issues as necessary to maintain high availability. ● Actively participate in a 24x7 on-call rotation for critical SaaS platform incidents, and be available in case of emergencies. ● Lead the incident response process, ensuring fast and effective mitigation and resolution of production issues. ● Perform thorough Root Cause Analysis (RCA) and lead blameless post-mortems to identify systemic weaknesses and create a corrective action plan to prevent recurrence. ● Collaborate with engineering teams to set and enforce error budgets (derived from SLOs, or Service Level Objectives), ensuring a healthy balance between development speed and system stability. Platform Automation & Infrastructure Development ● Automate routine operational tasks to reduce manual effort and "toil" and increase overall team efficiency. ● Design, deploy, and maintain cloud infrastructure using Infrastructure as Code (IaC), specifically leveraging Terraform and Helm for deployment to EKS/K8s clusters. ● Improve existing infrastructure health by developing and implementing checks and scripts to proactively correct known issues and self-heal the platform. ● Maintain, develop, and evolve our Continuous Integration/Continuous Delivery (CI/CD) deployment code and pipelines. ● Learn and maintain existing infrastructure running under Docker and Docker Swarm while driving migration strategies toward EKS/K8s. ● Implement and integrate new technologies and services into our Cloud Infrastructure to enhance platform capabilities and resilience. Monitoring & Observability ● Design and implement comprehensive Observability strategies across all three pillars: Metrics, Logs, and Traces. ● Proactively create and refine robust monitoring and alerting configurations within the EKS/K8s ecosystem. ● Utilize and maintain our Observability platform, Datadog, to gather performance data, create complex synthetic tests, and visualize system health via dashboards. ● Leverage existing monitoring solutions such as Grafana and Prometheus while planning and executing the migration or integration of data into a unified platform. ● Document all issues, remediation steps, system architecture, and runbooks to facilitate knowledge transfer and rapid incident response. ● Collaborate closely with Support, Customer Success, Migration, and Professional Services teams to provide the highest level of SaaS service and minimize customer impact during changes. ● Apply a real customer focus when planning deployments/updates, always considering the impact on the end-user before making changes. YOUR EXPERIENCE: Essential Skills and Qualifications ● Hands-on AWS Cloud Engineer experience, with expert working knowledge of the AWS Cloud ecosystem, including a good understanding of AWS IAM roles and policies. ● Proficiency with container orchestration technologies: EKS/Kubernetes (K8s). ● Demonstrable experience with Infrastructure as Code (IaC) tools, specifically Terraform and Helm. ● Working experience with Docker and maintaining systems using Docker Swarm. ● Expertise in setting up and managing logging and monitoring solutions. Direct experience with Datadog is highly preferred, with experience in setting up APM, infrastructure monitoring, and custom dashboards. ● Experience with existing monitoring solutions such as Grafana and Prometheus is required. ● Proficient in a Linux environment and strong skills in Bash and/or Python scripting for automation and troubleshooting. ● A strong understanding of web technologies, including REST APIs, Systems Architecture, Design, and Databases. ● Experience in Product/Application Support for high-availability SaaS-based products. ● Experience in designing, implementing, and operating in a DevSecOps environment. ● Excellent oral and written communication skills, with the ability to clearly explain complex technical issues and RCAs to both technical and customer-facing audiences.



  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Experience: 3+ years Location: Hyderabad About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...


  • Hyderabad, India ValueMomentum Full time

    About the Role We are seeking an experienced Site Reliability / Azure DevOps Engineer with Dynatrace Experience to join our engineering team and contribute to scalable CI/CD practices, infrastructure automation, and cloud operations. The ideal candidate will have deep expertise in Azure DevOps, Infrastructure as Code (IaC), Azure services, and modern DevOps...


  • Hyderabad, India ANSR Full time

    ANSR is hiring for one of its clients. About T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and...


  • Hyderabad, India ANSR Full time

    About T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. About TMUS Global...


  • Hyderabad, India S&P Global Full time

    Job Description About The Role Grade Level (for internal use): 11 Job Title: Senior Site Reliability Engineer Role Overview As a Site Reliability Engineer at ChartIQ, you'll play a critical role not only in building, maintaining, and scaling the infrastructure that supports our Development our Development and QA needs, but also in driving new, exciting...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPExperience: 3+ yearsLocation: HyderabadAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...


  • Hyderabad, India JP Morgan Chase & Co. Full time

    Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Consumer & Community Banking, youwill solve complex and broad...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPExperience: 3+ yearsLocation: HyderabadAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPExperience: 3+ yearsLocation: HyderabadAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Experience: 3+ years Location: Hyderabad About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API...