Reliability Engineer with Scalable Infrastructure Expertise

1 week ago


Bengaluru, Karnataka, India beBeeReliability Full time ₹ 1,04,000 - ₹ 1,30,878

About Us

We are a team of passionate experts focused on building cutting-edge financial services platforms.

Our mission is to empower every individual with the knowledge, tools, and confidence to make informed financial decisions.

At our core, we value customer-centricity, ownership, simplicity, long-term thinking, and transparency.

Job Description:

We are seeking an experienced Reliability Engineer to join our team. As a key member, you will be responsible for designing, implementing, and maintaining scalable infrastructure solutions that ensure high availability, performance, and security.

Your expertise will drive incident management, proactive system monitoring, and continuous improvement of our platform's reliability.

You will collaborate closely with software developers, platform engineers, and other team members to achieve these goals.

Requirements:

We are looking for a highly motivated and experienced professional with:

  • 6–9 years of experience in SRE, DevOps, or system architecture roles with large-scale production systems.
  • Extensive experience managing and scaling high-traffic, low-latency fintech systems, ensuring reliability, compliance, and secure transaction processing.
  • Proven expertise in networking stack, including BGP, OSPF, DNS, HTTP(S), TCP/IP, MPLS, and VPN protocols.
  • Advanced knowledge of GCP networking (VPC design, Shared VPC, Private Service Connect, Global Load Balancers, Cloud DNS, Cloud NAT, Network Intelligence Center, and Service Mesh).
  • Strong background in managing complex multi-cloud environments (AWS, GCP, Azure) with a focus on secure and compliant architectures in regulated industries.
  • Hands-on expertise in Terraform and Infrastructure-as-Code (IaC) for repeatable, automated deployments.
  • Expertise in Kubernetes, container orchestration, and microservices, with production experience in regulated fintech environments.
  • Advanced programming and scripting skills in Python, Go, or Java, applied to automation, risk reduction, and financial system resilience.
  • Proficiency with monitoring and logging tools (Prometheus, Mimir, Grafana, Loki) to ensure real-time visibility into trading, payments, and transaction flows.
  • Strong understanding of networking, load balancing, and DNS management across multi-cloud and hybrid infrastructures.
  • Implemented end-to-end observability solutions (metrics, logs, and traces) to monitor and optimize transaction throughput, adhering to latency SLAs.
  • Leadership skills with experience mentoring teams, fostering a culture of reliability, and partnering with cross-functional stakeholders in product teams.
  • Strong communication, critical thinking, and incident management abilities, especially in high-stakes production incidents involving customer transactions.
  • Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.

What You'll Do:

As a Reliability Engineer, you will:

  • Architect and lead the design of scalable, reliable infrastructure solutions.
  • Implement strategies for high availability, scalability, and low-latency performance.
  • Define service-level objectives (SLOs) and service-level indicators (SLIs) to track performance and reliability.
  • Drive incident management by identifying root causes and providing long-term solutions.
  • Mentor junior engineers and foster a collaborative, learning-focused environment.
  • Design advanced monitoring and alerting systems for proactive system management.
  • Architect and optimize network topologies (hybrid cloud, multi-cloud, and on-prem) to support ultra-low-latency trading and compliance-driven workloads.
  • Configure and manage cloud and on-prem networking components (VPCs, Shared VPCs, Private Service Connect, Cloud NAT, and Global Load Balancers) for secure and compliant transaction flows.
  • Implement secure connectivity solutions (VPNs, Interconnect, Direct Connect, and service meshes) to meet fintech regulatory requirements and standards.
  • Develop and maintain DNS, load-balancing, and traffic-routing strategies to ensure millisecond-level latency for real-time transactions.
  • Evolve Infrastructure as Code (IaC) practices and principles to automate infrastructure provisioning.
  • Collaborate on reliability roadmaps, performance benchmarks, and disaster recovery plans tailored for low-latency and high-throughput workloads.
  • Manage Kubernetes clusters at scale, integrating service meshes like Istio or Linkerd.
  • Implement chaos engineering principles to strengthen system resilience.
  • Influence technical direction, reliability culture, and organizational strategies.
,

  • Bengaluru, Karnataka, India beBeeEngineer Full time US$ 1,20,000 - US$ 1,70,000

    As a skilled Site Reliability Engineer, you will play a pivotal role in crafting and securing the demonstration infrastructure for our sales teams. Your responsibilities will encompass developing, operating, and maintaining critical infrastructure on AWS and Azure platforms.Key Responsibilities:Developing and operating robust infrastructure on cloud...


  • Bengaluru, Karnataka, India beBeeSystem Full time ₹ 1,00,00,000 - ₹ 2,00,00,000

    Job Title: System Reliability EngineerOur organization is seeking a highly skilled and experienced system reliability engineer to join our team. As a system reliability engineer, you will be responsible for ensuring the stability, scalability, and performance of our systems.Manage infrastructure supporting ad exchange applications, including load balancers,...


  • Bengaluru, Karnataka, India Yupp AI Full time US$ 1,20,000 - US$ 2,00,000 per year

    About the Company: We are a well-funded, rapidly growing, early-stage AI startup headquartered in Silicon Valley that is building a two-sided product - one side meant for global consumers and the other side for AI builders and researchers. We work on the cutting edge of AI across the stack. Check out our product that was launched recently, and how it solves...


  • Bengaluru, Karnataka, India beBeeCloud Full time US$ 2,00,000 - US$ 2,50,000

    Senior Site Reliability EngineerThe RoleThis critical position involves adopting infrastructure changes, maintaining and enhancing the backbone of cloud infrastructure. You will leverage your deep understanding of core infrastructure components and technologies such as base images, container orchestration, and cloud platforms like AWS, Azure, and GCP to...


  • Bengaluru, Karnataka, India Zealant Consulting Group Full time

    Job Summary :We are seeking a seasoned Site Reliability Engineer (SRE) Engineer to join our growing team.This is a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure on AWS. You will leverage your expertise in automation, infrastructure management, and cost optimization to build and maintain resilient systems...


  • Bengaluru, Karnataka, India beBeeCloud Full time ₹ 1,00,00,000 - ₹ 1,60,00,000

    About the JobOur organization empowers employees to craft their own success stories.We challenge, listen, value and support them in their journey of growth.This is an ideal opportunity for experienced engineers who want to develop their expertise in cloud infrastructure and site reliability engineering.Key Responsibilities:Maintain the reliability,...


  • Bengaluru, Karnataka, India beBeeReliability Full time ₹ 1,04,000 - ₹ 1,30,878

    Job Description">Design, implement and maintain scalable infrastructure on cloud platforms.Automate deployments, monitoring and incident response using Terraform, Kubernetes and CI/CD pipelines.Optimize system performance, troubleshoot incidents and implement postmortems.Enhance observability with Prometheus, Grafana and distributed tracing tools.Collaborate...


  • Bengaluru, Karnataka, India Xebia Full time

    We are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...


  • Bengaluru, Karnataka, India beBeeReliability Full time ₹ 1,00,00,000 - ₹ 2,50,00,000

    As a pioneer in healthcare technology, we strive to create an ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. Our cloud-based solutions improve clinical and financial performance across the care continuum.A forward-thinking organization is seeking a skilled Site Reliability Engineer (Linux Expert) to join our Cloud...


  • Bengaluru, Karnataka, India RevSure AI Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Company DescriptionRevSure AI is the leading provider of Full Funnel AI for B2B Go-to-Market strategies, offering an enterprise-grade solution for complex GTM motions. The platform is designed to help GTM teams achieve bold pipeline, revenue, and ROI goals by providing sharp insights, precise predictions, and actionable recommendations. RevSure helps...