Senior Site Reliability Engineer

2 days ago


India iVoyant Full time

One of our clients is looking for an experienced Senior Site Reliability Engineer (SRE) - Mission-Critical SaaS Cloud Products to join their team. Key Responsibilities: Reliability and Performance Management: Design, implement, and maintain highly available, scalable, and resilient cloud-native architectures for mission-critical SaaS products. Develop and implement SLOs, SLIs, and SLAs to measure and improve service reliability. Continuously optimize system performance and resource utilization across multiple cloud platforms. Finetune/Optimize Application performance by analyzing the code, traces and database queries. Incident Management and Troubleshooting: Lead incident response efforts, effectively troubleshooting complex issues to minimize downtime and impact. Reduce Mean Time to Recover (MTTR) through proactive monitoring, automated alerting, and efficient problem-solving techniques. Conduct thorough Root Cause Analysis (RCA) for all major incidents and implement preventive measures. Observability and Monitoring: Design and implement end-to-end observability solutions across our distributed systems. Develop and maintain comprehensive monitoring strategies using tools like ELK Stack, Prometheus, Grafana. Create and optimize product status dashboards to provide real-time visibility into system health and performance. Automation and Infrastructure as Code (IaC): Implement Infrastructure as Code practices using tools like Terraform. Develop and maintain automated deployment pipelines and CI/CD workflows. Create self-healing systems and automate routine operational tasks to reduce manual intervention. Cloud-Agnostic Architecture: Design and implement cloud-agnostic solutions that can operate efficiently across multiple cloud providers. Develop expertise in event-driven architecture and related technologies (e.g., Apache Kafka/EventHub, Redis, Mongo Atlas, IoTHub). Implement and manage containerized applications using Kubernetes across different cloud environments. Continuous Improvement : Regularly review and refine operational practices to enhance efficiency and reliability. Stay updated with the latest industry trends and technologies in SRE, cloud computing, and DevOps. Contribute to the development of internal tools and frameworks to support SRE practices. Requirements: Strong knowledge of cloud platforms - Azure and their associated services. Expert in Observability tools (ELK Stack, Dynatrace, Prometheus) Expertise in containerization technologies such as Docker and Kubernetes Understanding of Event-driven architecture and database technologies (Mongo Atlas, Azure SQL, Postgres DB) Proficient in IaaC tools such as - Terraform and GitHub Actions. Proficiency in one or more programming languages - Python/.Net/Java Strong understanding of networking concepts, load balancing, and security practices.



  • , India, IN Sonata Software Full time

    We're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...

  • Senior/expert site

    2 weeks ago


    India IVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice Location: India (Remote) -Must be available to work in the EST (US/Canada) Time Zone. Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...


  • Pune, India Barclays Full time

    Job Description Step into the role of Senior Site Reliability Engineer. At Barclays, we are more than a bank we are a force for progress. You will be the part of the central SRE (Site Reliability Engineer) core team within our wider Infrastructure team. You will act as a centre of excellence providing hands on consultancy to our different infrastructure...


  • India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Would you enjoy improving stability and safety of one of the largest global networks?Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency?Join the Platform Cloud Services Engineering TeamThe Platform Cloud Services SRE team supports globally distributed hosting and database systems for Akamai. These systems...


  • India Sapaad Full time

    WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F& B businesses across 40+ countries —with many more coming onboard each day. Driven by a...


  • India Sapaad Full time

    WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F&B businesses across 40+ countries —with many more coming onboard each day. Driven by a...


  • India Akamai Full time ₹ 8,00,000 - ₹ 25,00,000 per year

    Do you have the passion to architect and lead the next generation of public cloud infrastructure?Would you like to lead modernization initiatives while building a public cloud platform from scratch?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform....


  • Chennai, Tamil Nadu, India, Tamil Nadu Tata Consultancy Services Full time

    Dear Candidates,Greetings from TCS!!!TCS is looking for Senior Site Reliability Engineer – AWSExperience: 8-12 yearsLocation: ChennaiMust have skills: Design, implement, and maintain scalable, secure, and highly available infrastructure on AWSDevelop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, HarnessOwn and implement...


  • India Akamai Technologies Full time

    Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed systems problems Join the Mapping SRE team The Mapping SRE team manages availability, reliability, performance, and change processes for Akamai's mapping system. This system routes trillions of daily client...


  • India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Do you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...