
Site Reliability Engineer
4 weeks ago
We are seeking a skilled Site Reliability Engineer (SRE) with a strong background in data infrastructure and machine learning operations (MLOps). The ideal candidate will be responsible for designing and deploying scalable solutions for high-volume data, building robust ETL/ELT pipelines, and ensuring the reliability of our systems. You will play a critical role in automating workflows, managing releases, and collaborating with cross-functional teams to drive efficiency and innovation.
Key Responsibilities
- Data Infrastructure & Engineering: Design and develop scalable solutions for time-series data storage and retrieval. Build robust ETL/ELT pipelines using big data tools like Spark, Kafka, and Azure Data Factory.
- CI/CD & MLOps: Develop and manage CI/CD pipelines for ML models and data infrastructure using tools like MLflow, Jenkins, and Terraform.
- Monitoring & Automation: Implement end-to-end monitoring for infrastructure and data pipelines using Prometheus and Grafana. Automate workflows to ensure high availability and zero-downtime deployments.
- Release & Incident Management: Manage releases across environments, respond to Severity 1 incidents, and lead Root Cause Analysis (RCA).
- Collaboration & Analytics: Work with product teams to deploy releases, resolve customer escalations, and build tools that provide insights into business KPIs.
Qualifications
- A Bachelor's or Master's degree in Engineering with experience in SRE or related roles.
- Certifications such as AWS Solution Architect Associate or Azure are preferred.
- Strong technical skills in cloud platforms (AWS, Azure), big data technologies (Spark, Kafka, Databricks), and data storage.
- Proficiency in SQL, Python, Java/Scala, and container orchestration (Kubernetes).
- Familiarity with DevOps tools like Jenkins, Terraform, and Ansible.
- Excellent problem-solving, communication, and time management skills.
Competencies
- Problem-Solving & RCA: The ability to effectively troubleshoot and conduct root cause analysis.
- Technical Expertise: Strong skills in cloud and data engineering.
- Collaboration: Experience working with cross-functional teams.
- Automation: A mindset focused on innovation and automation to improve processes.
- Customer Focus: The ability to manage customer escalations and drive improvements.
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Enterprise Minds, Inc Full timeWe're Hiring | Site Reliability Engineer | 8-10 years
-
site reliability engineer
8 hours ago
Bengaluru, Karnataka, India Randstad Full timeRole: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...
-
Site Reliability Engineer
11 hours ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Role OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving innovation and modernizing complex systems by leveraging cutting-edge technologies and collaboration with cross-functional teams.
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Coforge Full timeJob Description- Design, implement, and maintain scalable infrastructure to ensure high availability and performance of software applications.- Collaborate with development teams to identify and resolve issues affecting application performance, stability, and reliability.- Develop automated monitoring scripts using tools like Prometheus, Grafana, etc. to...
-
Site Reliability Engineering
2 days ago
Bengaluru, Karnataka, India Infrasoft Technologies Limited Full timeJob DescriptionJob Title: DeveloperWork Location: Bangalore, KarnatakaExperience Range: 68 YearsJob Description:We are looking for a skilled Developer with strong hands-on experience in Site Reliability Engineering (SRE), Java, JavaScript, and Production Support. The ideal candidate should have a solid background in application monitoring and troubleshooting...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Collabera Full timeJob Description As a Principal/Chief Site Reliability Engineer , you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems. You'll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof infrastructure. Responsibilities Design and implement...
-
Site Reliability Engineer
3 days ago
Bengaluru, Karnataka, India Xebia Full timeWe are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...
-
Site Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India NatWest Group Full timeJoin us as a Site Reliability Engineer In this key role you ll support the improvement of non-functional and operational characteristics such as availability performance efficiency change management monitoring security incident response and capacity planning of our products and services You ll enjoy significant stakeholder interaction working in...