Site Reliability Engineer

3 weeks ago


Bengaluru, Karnataka, India Success Pact Consulting Pvt Ltd Full time

Position : Site Reliability Engineer

Experience : 5 - 9 Years

Location : Bangalore, India

Job Summary :

We are seeking an experienced Site Reliability Engineer (SRE) with 5-9 years of experience to join our Platform Engineering team. This role is crucial for ensuring the high availability, performance, and scalability of our AI-powered code review platform. As a key member of the team, you will operate at the intersection of software engineering and systems operations, building the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably.

You will be instrumental in enhancing the reliability of critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. This role involves working with cutting-edge technologies, including large language models, real-time processing systems, and distributed architectures.

Key Responsibilities :

Infrastructure and Platform Ownership :

- Design, implement, and maintain a scalable infrastructure on Google Cloud Platform (GCP).

- You will own and operate critical platform services and build and maintain Infrastructure as Code (IaC) using Terraform to ensure consistent and reproducible deployments.

Reliability and Performance Engineering :

- Implement and maintain SLI/SLO frameworks to meet reliability commitments.

- You will deploy comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation.

- Your duties will also include conducting thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability.

- You will be responsible for optimizing application and infrastructure performance and designing and implementing chaos engineering practices to proactively identify system weaknesses.

Automation and Developer Experience :

- Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently.

- You will automate operational tasks such as scaling, backup/recovery, and security patching.

- A key part of your role will be to create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams.

Security and Compliance :

- You will be tasked with integrating security best practices into all infrastructure and platform services. This includes implementing security monitoring, vulnerability scanning, and compliance reporting.

- You will also design secure network architectures and establish disaster recovery and business continuity plans.

Required Skills & Qualifications Experience :

- 5+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.

- A proven track record of managing production systems at scale in high-growth technology companies.

Technical Proficiency :

- Programming Languages : Proficiency in Node.js and TypeScript for building automation tools.

- Infrastructure as Code : Advanced experience with Terraform.

- Monitoring & Observability : Hands-on experience with Datadog or similar platforms like Prometheus, Grafana, or the ELK stack.

- Cloud Platforms : Comprehensive experience with GCP services, including Compute Engine, GKE, Cloud Run, Cloud SQL, and Cloud Storage.

- Strong Linux/Unix systems skills.

- Experience with Kubernetes and Docker.

- Understanding of microservices architecture and distributed systems principles.

Preferred Skills :

- Experience with AI/ML infrastructure and tools.

- Background in managing high-traffic web applications and API services.

- Experience with disaster recovery planning and execution.

- Knowledge of FinOps practices and cost optimization.

- Experience with performance testing and capacity planning methodologies.

- Contributions to open-source SRE or infrastructure tooling projects.

(ref:hirist.tech)

  • Bengaluru, Karnataka, India AppHelix Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Role DescriptionThis is a full-time on-site role located in Bengaluru for a Site Reliability Engineer. The Site Reliability Engineer will be responsible for maintaining and improving the reliability of AppHelix's systems. Daily tasks include monitoring system performance, troubleshooting issues, managing infrastructure, and supporting software development....


  • Bengaluru, Karnataka, India FIS Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    About the Role :Site Reliability Engineer (SRE)with deep expertise inMainframe technologies like COBOL, JCL, etc. to support and enhance ourCard Management & Payment processing functions. This role will be responsible for ensuring reliability, high availability, scalability, stability and performance of mission-critical mainframe software applications and...


  • Bengaluru, Karnataka, India Enterprise Minds, Inc Full time

    We're Hiring | Site Reliability Engineer | 8-10 years


  • Bengaluru, Karnataka, India Randstad Full time

    Role: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India ViewSonic Full time

    Job Requirements:1. Bachelor's degree in Computer Science, Engineering, or a related field.2. 3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.3. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.4. Interest and understanding of Platform...


  • Bengaluru, Karnataka, India ViewSonic Full time

    Job Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. 3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS. Interest and understanding of Platform...


  • Bengaluru, Karnataka, India HDFC Limited Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability Engineering...


  • Bengaluru, Karnataka, India Visa Inc. Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Job Description We seek a Site Reliability Engineer, working in the Product Reliability Engineering function who will:Perform day-to-day site reliability engineering functions including maintenance and incident resolution for all debit applications, products, and services including debit, prepaid, and risk lines of business. Perform ongoing/proactive...