Reliability Architect

1 day ago


Bengaluru, Karnataka, India beBeeReliability Full time US$ 1,25,000 - US$ 1,75,000

We are looking for a highly skilled Reliability Architect to join our team.

About the Role:

The Senior Site Reliability Engineer will play a critical role in ensuring the reliability, scalability, and performance of our organization's systems and infrastructure.

Key Responsibilities:
  • Design and Implementation: Design, implement, and maintain highly available and scalable infrastructure systems, ensuring maximum uptime and performance.
  • Collaboration: Collaborate with software engineering teams to build and deploy applications using best practices in reliability, scalability, and security.
  • Automation: Develop and implement automation tools and frameworks to streamline operational processes, reduce manual intervention, and improve efficiency.
  • Monitoring and Analysis: Monitor and analyze system performance, identifying bottlenecks, and implementing solutions to optimize performance and scalability.
  • Error Handling: Implement and maintain effective monitoring, alerting, and logging systems to proactively identify and resolve issues before they impact users.
  • CI/CD Pipelines: Hands-on experience in building CI/CD automated pipelines using GitHub Actions/Jenkins/GitLab or equivalent platform.
  • Automation Skills: Excellent skills in automating workflows or solutions using Python/Go/Shell.
  • Incident Response: Lead incident response and root cause analysis efforts, driving continuous improvement and preventing future incidents.
  • Cross-functional Collaboration: Collaborate with cross-functional teams to define and enforce best practices, standards, and guidelines for system reliability and performance.
  • On-call Rotation: Participate in on-call rotations and respond to incidents, ensuring timely resolution and minimal impact to users and thereby meeting SLAs.
  • Disaster Recovery: Plan and devise Disaster Recovery (DR) strategies and implement DR Plans.
  • Mentorship: Mentor and provide guidance to junior team members, fostering a culture of learning and growth.
  • System Health: Run the production environment by monitoring availability and taking a holistic view of system health.
  • Infrastructure Management: Build software and systems to manage platform infrastructure and applications.
  • Performance Optimization: Improve reliability, quality, and time-to-market of our suite of software solutions.
  • System Performance: Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.

Required Skills and Qualifications:

  • Experience: Proven experience as a Site Reliability Engineer or similar role, with a focus on designing and maintaining highly available and scalable systems.
  • Programming Skills: Strong programming and scripting skills (Python, Bash, etc.) to automate operational tasks and develop tooling.
  • Cloud Platforms: Experience with cloud platforms (AWS) and containerization technologies (Docker, EKS).
  • Configuration Management: Proficient in configuration management tools like Ansible and infrastructure-as-code frameworks such as Terraform and CloudFormation.
  • Monitoring and Logging: Experience with monitoring and logging tools (Prometheus, Grafana, Loki, Sentry.io, CloudWatch, etc.) for proactive system monitoring and troubleshooting.
  • Problem-solving Skills: Ability to program (Structured and OOP) using one or more high-level languages, such as Java and JavaScript
  • Networking Knowledge: Solid understanding of networking principles, protocols, and security best practices.
  • Communication Skills: Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams.
  • Distributed Storage: Experience with distributed storage technologies such as NFS, Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Agile Methodologies: Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
  • Software Design: Experience in Agile methodologies
  • Architecture Patterns: Strong skills in software design, design patterns
  • Client-server Computing: Experience in different architecture patterns like client-server/serverless computing.
  • Communication: Effective written, verbal and presentation skills with the ability to clearly articulate ideas and concepts.
  • Leadership: Self-directed and able to direct others.

Desired Skills & Abilities:

  • Load Testing: Experience with setting up performance/load test environments.
  • SOC2 Audit: Familiarity with SOC2 audit processes.


  • Bengaluru, Karnataka, India beBeeInfrastructure Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job Title: Site Reliability/DevOps EngineerThe position of Site Reliability/DevOps Engineer is crucial in ensuring the reliability, observability, and infrastructure of our platform running entirely on Microsoft Azure are scaled and owned. This role combines SRE and DevOps principles to shape the DevOps culture, architect fault-tolerant systems, and deploy...


  • Bengaluru, Karnataka, India beBeeDatabaseEngineer Full time ₹ 20,00,000 - ₹ 25,00,000

    Job SummaryWe are seeking a seasoned Database Reliability Engineer to join our team.The ideal candidate will have a proven track record of designing and operating MySQL/PostgreSQL infrastructure on AWS (RDS or self-managed EC2).Main ResponsibilitiesDesign, implement, and maintain highly available and scalable database infrastructure using...


  • Bengaluru, Karnataka, India beBeeReliability Full time ₹ 15,00,000 - ₹ 25,00,000

    Job DescriptionWe are seeking a highly skilled and experienced Systems Reliability Specialist to join our team.The ideal candidate will have a deep understanding of system design, deployment, and maintenance. They will be responsible for ensuring the smooth operation of our systems, identifying areas for improvement, and implementing changes to increase...


  • Bengaluru, Karnataka, India beBeeSenior Full time US$ 1,80,000 - US$ 2,50,000

    As a Senior Site Reliability Engineer, you will play a pivotal role in driving the reliability, scalability, and performance of our Observability Platform. This position focuses on managing SaaS infrastructure at scale, improving system reliability through cloud-native architecture, advanced data platform operations, and automation.Key Responsibilities:Lead...


  • Bengaluru, Karnataka, India beBeeReliability Full time ₹ 1,04,000 - ₹ 1,30,878

    Service Reliability Engineer Job DescriptionThe role of a Service Reliability Engineer involves ensuring the high availability and performance of services. This entails collaborating with developers and solution architects to build scalable services, setting service-level objectives and agreements, and designing tools that improve system reliability.To excel...


  • Bengaluru, Karnataka, India beBeeEngineer Full time US$ 1,20,000 - US$ 1,40,000

    **Site Reliability Engineer Role Summary:**We are seeking a skilled Site Reliability Engineer to join our team. The ideal candidate will have a strong background in SRE, DevOps, or infrastructure roles and be proficient in one or more scripting languages.The successful candidate will ensure the reliability, scalability, and performance of mission-critical...


  • Bengaluru, Karnataka, India beBeeDevops Full time ₹ 18,00,000 - ₹ 24,00,000

    Job Title:Site Reliability Engineering ProfessionalAbout the Role:We are seeking an experienced DevOps Engineer with expertise in Site Reliability Engineering to join our team at a leading organization. The ideal candidate will have a strong background in scripting languages, CI/CD tools, IaC tools, cloud platforms, and containerization.Key...


  • Bengaluru, Karnataka, India Collabera Full time

    Job Description As a Principal/Chief Site Reliability Engineer , you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems. You'll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof infrastructure. Responsibilities Design and implement...


  • Bengaluru, Karnataka, India beBeeCloud Full time ₹ 20,00,000 - ₹ 30,00,000

    Job OpportunityWe are seeking an experienced Cloud Engineer to lead our cloud reliability and chaos engineering initiatives.About the Role:This is a unique opportunity for a skilled professional to join our team and contribute to the design, build, and validation of resilient, scalable, and automated cloud-native environments. The ideal candidate will have...


  • Bengaluru, Karnataka, India Aerospike Full time US$ 1,25,000 - US$ 1,75,000 per year

    About Aerospike Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays,...