
Site Reliability Engineer
4 hours ago
Position : Site Reliability Engineer
Experience : 5 - 9 Years
Location : Bangalore, India
Job Summary :
We are seeking an experienced Site Reliability Engineer (SRE) with 5-9 years of experience to join our Platform Engineering team. This role is crucial for ensuring the high availability, performance, and scalability of our AI-powered code review platform. As a key member of the team, you will operate at the intersection of software engineering and systems operations, building the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably.
You will be instrumental in enhancing the reliability of critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. This role involves working with cutting-edge technologies, including large language models, real-time processing systems, and distributed architectures.
Key Responsibilities :
Infrastructure and Platform Ownership :
- Design, implement, and maintain a scalable infrastructure on Google Cloud Platform (GCP).
- You will own and operate critical platform services and build and maintain Infrastructure as Code (IaC) using Terraform to ensure consistent and reproducible deployments.
Reliability and Performance Engineering :
- Implement and maintain SLI/SLO frameworks to meet reliability commitments.
- You will deploy comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation.
- Your duties will also include conducting thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability.
- You will be responsible for optimizing application and infrastructure performance and designing and implementing chaos engineering practices to proactively identify system weaknesses.
Automation and Developer Experience :
- Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently.
- You will automate operational tasks such as scaling, backup/recovery, and security patching.
- A key part of your role will be to create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams.
Security and Compliance :
- You will be tasked with integrating security best practices into all infrastructure and platform services. This includes implementing security monitoring, vulnerability scanning, and compliance reporting.
- You will also design secure network architectures and establish disaster recovery and business continuity plans.
Required Skills & Qualifications Experience :
- A proven track record of managing production systems at scale in high-growth technology companies.
Technical Proficiency :
- Programming Languages : Proficiency in Node.js and TypeScript for building automation tools.
- Infrastructure as Code : Advanced experience with Terraform.
- Monitoring & Observability : Hands-on experience with Datadog or similar platforms like Prometheus, Grafana, or the ELK stack.
- Cloud Platforms : Comprehensive experience with GCP services, including Compute Engine, GKE, Cloud Run, Cloud SQL, and Cloud Storage.
- Strong Linux/Unix systems skills.
- Experience with Kubernetes and Docker.
- Understanding of microservices architecture and distributed systems principles.
Preferred Skills :
- Experience with AI/ML infrastructure and tools.
- Background in managing high-traffic web applications and API services.
- Experience with disaster recovery planning and execution.
- Knowledge of FinOps practices and cost optimization.
- Experience with performance testing and capacity planning methodologies.
- Contributions to open-source SRE or infrastructure tooling projects.
-
Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India Enterprise Minds, Inc Full timeWe're Hiring | Site Reliability Engineer | 8-10 years
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...
-
site reliability engineer
4 days ago
Bengaluru, Karnataka, India Randstad Full timeRole: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...
-
Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Site Reliability Engineer
1 day ago
Bengaluru, Karnataka, India TRUGlobal Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJob Title: Site Reliability Engineer (SRE) with Python Development ExpertisePosition Overview: We are seeking a skilled Site Reliability Engineer (SRE) with strong Python development experience to join our team. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our services across both on-premises and...
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Role OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving innovation and modernizing complex systems by leveraging cutting-edge technologies and collaboration with cross-functional teams.
-
Site Reliability Engineer
1 day ago
Bengaluru, Karnataka, India IDESLABS PRIVATE LIMITED Full time US$ 90,000 - US$ 1,20,000 per yearExperience: 5+ YearsSkill:Site reliability engineerLocation: BangaloreNotice Period:Immediate.Employment Type: ContractWorking Mode: HybridJob DescriptionSite Reliability Engineer Tech StackPrimaryAWSTerraformAnsibleDockerSecondaryPythonBashGithubJenkins
-
Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India Coforge Full timeJob Description- Design, implement, and maintain scalable infrastructure to ensure high availability and performance of software applications.- Collaborate with development teams to identify and resolve issues affecting application performance, stability, and reliability.- Develop automated monitoring scripts using tools like Prometheus, Grafana, etc. to...
-
Site Reliability Engineering
6 days ago
Bengaluru, Karnataka, India Infrasoft Technologies Limited Full timeJob DescriptionJob Title: DeveloperWork Location: Bangalore, KarnatakaExperience Range: 68 YearsJob Description:We are looking for a skilled Developer with strong hands-on experience in Site Reliability Engineering (SRE), Java, JavaScript, and Production Support. The ideal candidate should have a solid background in application monitoring and troubleshooting...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Collabera Full timeJob Description As a Principal/Chief Site Reliability Engineer , you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems. You'll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof infrastructure. Responsibilities Design and implement...