Site Reliability Engineer

1 week ago


India Intuitive.Cloud Full time

About us:

Intuitive.Cloud is one of the fastest-growing (INC 5000, CRN) Cloud & SDx solution and services companies supporting enterprise customers on a global scale. Intuitive is an "Engineering Company" delivering measurable value and key business outcomes.

Intuitive Superpowers:

- DataOps & AI/ML

- Cloud Native, AppSecOps, DevSecOps

- Cloud Migration & Transformation

- Cloud FinOps

- Cybersecurity (App/Data/Infra) & GRC

- SDx & Digital Workspace


We are proud to partner with some of the world's leading enterprises and serve 200+ customers across different industry verticals. We have achieved many milestones along the way, including being recognized as a top-10 fast-growth 150 IT company in the Americas by CRN in 2022 and being named one of America's fastest-growing private companies by INC 5000 in 2022. That’s not all Even CIO Review awarded us as the Most Promising Cloud Migration Company and Artificial Intelligence Solutions Provider in 2022.


About the job:

Title – Site Reliability Engineer

Start date: Immediate

Position Type: Full Time

Work Timing: US (Eastern Time Zone).

Location: Remote across India


Job Description:

We are seeking an experienced Site Reliability Engineer (SRE) to enhance operational efficiency, reliability, and observability across infrastructure and application landscapes. This role focuses on integrating advanced monitoring platforms, defining key performance metrics, and establishing comprehensive monitoring solutions to ensure system health and performance. The SRE will work closely with cross-functional teams to implement alerting mechanisms, improve scalability, and drive the adoption of best practices in observability and reliability engineering.


Roles and Responsibilities:


Observability Platform Integration

  • Lead the transition to modern monitoring platforms, ensuring seamless integration with existing systems.
  • Define and implement observability strategies to enhance visibility into infrastructure and applications.
  • Collaborate with stakeholders to identify critical workloads and performance metrics.


Monitoring and Alerting

  • Develop and implement monitoring solutions for applications, databases, and infrastructure, capturing metrics such as availability, performance, and resource utilization.
  • Establish alerting frameworks to detect anomalies, performance bottlenecks, and security incidents.
  • Integrate monitoring and alerting with ITSM tools for streamlined incident management.


Performance Metrics and SLAs

  • Define and track Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs) for key business systems.
  • Work with stakeholders to align metrics with business expectations and operational goals.


Automation and Scalability

  • Leverage scripting and automation tools to streamline deployment of monitoring agents and configuration updates.
  • Optimize monitoring platforms for scalability and efficiency, ensuring they can accommodate evolving business needs.


Dashboard Development

  • Design and maintain dashboards to provide real-time insights into system performance and health.
  • Ensure dashboards are intuitive and actionable, enabling teams to monitor critical metrics effectively.


Cloud Infrastructure Performance

  • Deep understanding of cloud infrastructure and services
  • Diagnose, troubleshoot, and optimize performance issues in cloud services, including compute, storage, and networking components.
  • Implement monitoring and tuning practices specific to cloud-native environments to ensure reliability and scalability.


Documentation and Training

  • Develop comprehensive documentation for monitoring tools, configurations, and processes.
  • Conduct training sessions to ensure teams are proficient in utilizing observability platforms and interpreting metrics.


Continuous Improvement

  • Continuously evaluate and enhance monitoring and observability solutions to meet changing organizational needs.
  • Incorporate feedback from stakeholders to refine alerting thresholds, dashboards, and metrics.


Mandatory Skills:


Performance Monitoring:

  • Expertise with modern observability platforms – Sumo Logic
  • Experience with Azure native monitoring solutions and practices
  • Deep understanding of Azure infrastructure and services, including diagnosing and tuning performance issues with such services.
  • Strong knowledge of monitoring methodologies for infrastructure, applications, and databases.
  • Experience in monitoring/integrating observability platforms with Active Directory Domain Controllers, PeopleSoft Applications and Order Entry Systems (KPS).
  • Experience with log management, metric collection, and alerting configuration.
  • Ability to define and track SLAs, SLOs, and SLIs for business-critical systems.
  • Experience in monitoring network, application, and database performance metrics.
  • Strong understanding of network and security device monitoring, including SNMP, syslog, and NetFlow.
  • Hands-on experience in application performance monitoring for enterprise platforms like ERP or custom applications.
  • Familiarity with containerized environments and Kubernetes monitoring.


Automation Skills:

  • Experience with scripting languages (e.g., Python, Bash, PowerShell) to automate monitoring setup and management.
  • Familiarity with infrastructure automation tools like Ansible and Terraform.


Communication and Collaboration:

  • Strong collaboration skills to work with cross-functional teams and stakeholders.
  • Ability to communicate technical concepts to both technical and non-technical audiences.


Incident Management:

  • Familiarity with ITSM tools (e.g., ServiceNow) for incident and problem management.
  • Proven experience in integrating alerting mechanisms with incident management workflows.


  • India Delphic (South Asia) Full time

    Job Title: Site Reliability Engineer (SRE)Location: RemoteJob Type: Full-timeExperience: 7 yearsIntroduction:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will...


  • India Delphic (South Asia) Full time

    Job Title: Site Reliability Engineer (SRE) Location: Remote Job Type: Full-time Experience : 7 years Introduction: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You...


  • India Tranzeal Incorporated Full time

    Job Title: Site Reliability Engineer (SRE) Location: Bangalore, KA Work Mode: Office (5Days/Week) Position Type: Contract based We're hiring a Site Reliability Engineer to join our team in Bangalore! If you have a strong background in maintaining and scaling cloud services and love automating infrastructure at scale, this is for you. ...


  • India IDEMIA Full time

    We are hiring for Site Reliability Engineer role at Noida location. Responsibility: Involved in deploy/manage/operate of medium to large scale production systems Understanding of Linux as a runtime environment Familiar to Cloud native concepts and virtualisation Familiar to CI/CD concepts and tools like Jenkins, Gitlab etc Previous...


  • India IDEMIA Full time

    We are hiring for Site Reliability Engineer role at Noida location. Responsibility: Involved in deploy/manage/operate of medium to large scale production systems Understanding of Linux as a runtime environment Familiar to Cloud native concepts and virtualisation Familiar to CI/CD concepts and tools like Jenkins, Gitlab etc Previous...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams.Title: Site Reliability EngineerLocation: India - RemoteEmployment Type: PermanentNotice Period:...


  • India K&K social resources and development GmbH Full time

    K&K Social Resources & Development GmbH is an international recruiting agency that has been providing technical resources in the European region since 1993. This position is with one of our clients in India who is actively hiring candidates to expand their teams. Title: Site Reliability Engineer Location: India - Remote Employment Type: Permanent ...


  • India InstaService Inc Full time

    About Us:At InstaService, we are committed to delivering reliable, high-performance home services to our customers. As a fast-growing on-demand services platform, we are looking for a talented DevOps / Site Reliability Engineer (SRE) to join our dynamic team. This role is crucial to scaling and maintaining our infrastructure, ensuring our platform remains...


  • India Insight Global Full time

    Title : SRE Duration : 12 month contract Location : HYBRID 3x/week onsite in Hyderabad, India Desired Skills & Experience · Bachelor's degree in Computer Science, Engineering, or a related field. · 3+ years of experience in Systems Engineering or Site Reliability Engineering. · Strong proficiency in Go Lang programming · Experience...


  • India BCE Global Tech Full time

    About the role We are seeking a talented Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in software engineering and systems administration, with a passion for building scalable and reliable systems. As an SRE, you will collaborate with development and operations teams to ensure our services are reliable,...


  • India Gemini Solutions Pvt Ltd Full time

    W e are looking for 3-9 yrs experience candidate in Devops SRE ,In this you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications....


  • India Gemini Solutions Pvt Ltd Full time

    W e are looking for 3-9 yrs experience candidate in Devops SRE ,In this you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications....


  • Anywhere in India/Multiple Locations Stealth Startup Full time

    Key ResponsibilitiesAt Stealth Startup, we're looking for a skilled Site Reliability Engineer to maintain and enhance the reliability, availability, and performance of our large-scale distributed systems. Your key responsibilities will include automating deployment, monitoring, and management of production systems, as well as implementing and managing CI/CD...


  • India Insight Global Full time

    Title : SRE Duration : 12 month contract Location : HYBRID 3x/week onsite in Hyderabad, India Desired Skills & Experience · Bachelor's degree in Computer Science, Engineering, or a related field. · 3+ years of experience in Systems Engineering or Site Reliability Engineering. · Strong proficiency in GoLang programming · Experience...


  • India Intuitive.Cloud Full time

    About us: Intuitive. Cloud is one of the fastest-growing (INC 5000, CRN) Cloud & SDx solution and services companies supporting enterprise customers on a global scale. Intuitive is an "Engineering Company" delivering measurable value and key business outcomes. Intuitive Superpowers: - Data Ops & AI/ML - Cloud Native, App Sec Ops, Dev Sec Ops - Cloud...


  • India Mitra AI Full time

    About the job About Mitra Innovation ( Mitra AI is a global technology company that specializes in AI-driven solutions, cloud engineering, enterprise integration and workflow automation. Headquartered in the UK, Mitra has been serving global clients for the last 12 years in the US, UK, EU and APAC, across industries such as BFSI, telecommunications,...


  • India Mitra AI Full time

    About the job About Mitra Innovation ( Mitra AI is a global technology company that specializes in AI-driven solutions, cloud engineering, enterprise integration and workflow automation. Headquartered in the UK, Mitra has been serving global clients for the last 12 years in the US, UK, EU and APAC, across industries such as BFSI, telecommunications,...


  • India Ushur Full time

    Location: Bangalore Experience: 6-8 Years Work Mode: Hybrid/Remote The Role Senior Site Reliability Engineers at Ushur perform a unique blend of customer support engineering, solution engineering, and operational engineering. You will work on our largest customers’ most complex problems and craft intuitive, elegant solutions. You’ll also...


  • india Coforge Full time

    Job Title: Site Reliability EngineerSkills: SRE, CI/CD, AWS, Python, Terraform & KubernetesLocation: Hyderabad (Work from Office)Experience: 7-15 YearsNote: Immediate joiners are preferableJob Description:We at Coforge are hiring a Site Reliability Engineer with the following skillset:Design, implement, and manage scalable and secure cloud-based...


  • india Coforge Full time

    Job Title: Site Reliability Engineer Skills : SRE, CI/CD, AWS, Python, Terraform & Kubernetes Location: Hyderabad (Work from Office) Experience: 7-15 Years Note: Immediate joiners are preferable Job Description: We at Coforge are hiring a Site Reliability Engineer with the following skillset: Design, implement, and manage scalable and secure cloud-based...