Site Reliability Engineer

3 weeks ago


India CareerUS Solutions Full time

Job Description Position Overview: The Site Reliability Engineer (SRE) is responsible for ensuring the stability, scalability, performance, and reliability of production systems and services. This role bridges software development and operations, using automation, monitoring, and performance optimization to build resilient systems that can scale efficiently and recover quickly from failures. Key Responsibilities: - Design, build, and maintain highly reliable and scalable systems and infrastructure. - Automate deployment, monitoring, and maintenance processes using DevOps tools and scripts. - Implement and manage CI/CD pipelines to support continuous delivery. - Monitor application performance, identify bottlenecks, and improve uptime and reliability. - Develop and maintain incident response procedures, including root cause analysis and postmortems. - Collaborate with development teams to design systems for fault tolerance, load balancing, and failover. - Manage and optimize cloud infrastructure (AWS, Azure, GCP). - Implement observability solutions logging, metrics, tracing, and alerting. - Maintain strong security and compliance standards across infrastructure. - Participate in on-call rotations and ensure 24/7 system availability. - Document processes, configurations, and runbooks for operational consistency. Required Skills & Qualifications: - Bachelor's degree in Computer Science, Information Technology, or related field. - Strong knowledge of Linux/Unix systems administration and shell scripting. - Proficiency with automation and configuration tools (Ansible, Terraform, Chef, Puppet). - Experience with cloud platforms AWS, Azure, or Google Cloud. - Familiarity with containerization and orchestration tools (Docker, Kubernetes). - Solid understanding of CI/CD tools (Jenkins, GitLab CI, CircleCI). - Strong experience with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog). - Knowledge of networking fundamentals, load balancing, and DNS management. - Proficiency in at least one programming language (Python, Go, or Bash). - Excellent analytical, problem-solving, and communication skills. Preferred Qualifications: - Experience with infrastructure-as-code (IaC) and serverless architectures. - Knowledge of reliability metrics such as SLOs, SLIs, and error budgets. - Exposure to database administration (MySQL, PostgreSQL, MongoDB, Redis). - Familiarity with security practices for cloud-native systems. - Certifications such as AWS Certified DevOps Engineer, Google SRE Certification, or CKA (Certified Kubernetes Administrator).



  • Bengaluru, India Relanto Full time

    Job Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...


  • India Grootan Technologies Full time

    About the Role We are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...


  • India InOrg Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    About VivaOps :VivaOps is a leading DevSecOps platform company specializing in GitLab - The comprehensive DevOps platform, to transform and secure software development processes. We help organizations to streamline their DevSecOps journey by offering a complete range of GitLab services, from advisory, to implementation and managed services, to accelerate...


  • India Akamai Technologies Full time

    Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...


  • India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Do you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...


  • India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Description Do you like collaborating across teams to solve complex problems? Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating...


  • India LivePerson Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    LivePerson (NASDAQ: LPSN) is a leading customer engagement company, creating digital experiences powered by Curiously Human AI. Every person is unique, and our technology makes it possible for companies, including leading brands like HSBC, Orange, and GM Financial, to treat their audiences that way at scale. Nearly a billion conversational interactions are...


  • India LivePerson Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    LivePerson (NASDAQ: LPSN) is a leading customer engagement company, creating digital experiences powered by Curiously Human AI. Every person is unique, and our technology makes it possible for companies, including leading brands like HSBC, Orange, and GM Financial, to treat their audiences that way at scale. Nearly a billion conversational interactions are...


  • Hyderabad, India UBS Full time

    Job Description Job Reference # 322870BR Job Type Full Time Your role Are you an analytic thinker Do you enjoy Site Reliability Engineering initiatives and proactive problem management across on-premises & Cloud Database ensuring high availability & stability of Database infrastructure services Do you want to play a key role in transforming our firm into an...


  • India CitNOW Group Full time

    About us Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably. CitNOW’s app-based platform provides a secure, brand-compliant solution – for dealers to build trust, transparency and long-lasting relationships. CitNOW Group was formed...