Site Reliability Engineer
16 hours ago
About the jobAt Hydrolix, we are revolutionizing the world of data management and analytics with our innovative cloud data platform, purpose-built for petabyte-scale datasets. Our mission is to help organizations drastically reduce data costs while increasing their data retention.We are looking for a Site Reliability Engineer (SRE) with 8 to 10+ years experience to join our dynamic Services team. In this role, you will contribute to the reliability and scalability of our cutting-edge platform, ensuring exceptional solutions tailored to our customers’ unique needs. This is a highly technical, hands-on role that requires deep expertise in system reliability and automation.Key ResponsibilitiesInfrastructure Reliability: Deploy, maintain, and ensure a highly reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms.Service Optimization: Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services.CI/CD Management: Build and optimize CI/CD tools and processes to ensure efficient and reliable deployments.Monitoring and Incident Response: Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and enable rapid recovery.Root Cause Analysis: Conduct comprehensive root cause analyses for system failures, implementing long-term preventive measures.Automation and Efficiency: Automate repetitive tasks and optimize system performance to improve operational efficiency.On-Call Support: Participate in covering weekday business hours and once-monthly weekend shifts.Collaboration and Customer EngagementCross-Functional Teamwork: Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into every stage of the development lifecycle.Reliability Advocacy: Champion SRE best practices and foster a culture of operational excellence across the organization.Global Team Collaboration: Collaborate with a distributed team of engineers worldwide to provide round-the-clock support.Customer Support: Interface with customers to address and resolve reported incidents, ensuring a seamless user experience.Qualifications and SkillsSRE Expertise: Proven experience as a Site Reliability Engineer or similar role, with a history of supporting complex distributed systems.Observability Tools: Experience with monitoring and debugging tools like Prometheus, Vector, Grafana, Superset, or Kibana.Cloud Platforms: Proficiency in at least one major cloud platform (AWS, GCP, Azure, or Linode).UI Development Experience: Hands-on experience building internal tooling using modern frontend frameworks (e.g., React, Vue, or Angular etc), enabling improved visibility, and operational workflows for engineering teams.Database Knowledge: Experience with SQL databases; familiarity with PostgreSQL is a plus but not required.Programming/Scripting Skills: Proficiency in Unix scripting and programming languages such as Python or GoLinux Expertise: Strong experience with Linux systems, including performance tuning and system-level troubleshooting.Communication Skills: Excellent written and verbal communication skills, with the ability to convey technical concepts clearly to diverse audiences, including customers and cross-functional teams.Hydrolix provides equal employment opportunities without regard to an applicant’s race, sex, pregnancy, sexual orientation, gender identity or expression, genetic information, national origin, age, physical or mental disability, medical condition, religion, marital status or veteran status.Applicants with disabilities may be entitled to reasonable accommodation under the terms of the Americans with Disabilities Act and certain state or local laws. A reasonable accommodation is a change in the way things are normally done which will ensure an equal employment opportunity without imposing undue hardship on Hydrolix. Please inform us if you need assistance completing any forms or to otherwise participate in the application process.
-
Site Reliability Engineer
2 weeks ago
bangalore, India super Full timeSite Reliability Engineer (SRE) Level 3Overview:A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and highly reliable systems. This role emphasizes a blend of software and systems engineering to ensure the availability, latency, performance, and capacity...
-
Site Reliability Engineer
4 days ago
bangalore, India Pagos Consultants Full timewe are looking for experienced site reliability engineers to join a founding team of startup-minded individuals that will lay the groundwork for our new fintech offering. This team will play a pivotal role in spearheading innovation. As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its...
-
Site Reliability Engineer
10 hours ago
bangalore, India Tata Consultancy Services Full timeTCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: Site Reliability Engineering (SRE) Experience Range: 5 – 15 Years Location: Chennai/Pune candidates should come to office for Walk in...
-
Site Reliability Engineer
20 hours ago
bangalore, India Tata Consultancy Services Full timeTCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together.What we are looking forRole: Site Reliability Engineering (SRE)Experience Range: 5 – 15 YearsLocation: Chennai/Punecandidates should come to office for Walk in Drive(Face to...
-
Site Reliability Engineer
3 days ago
bangalore, India Enterprise Minds, Inc Full timeSenior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call)We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP).If you thrive in fast-paced environments, excel in incident management, and...
-
Site Reliability Engineer
2 weeks ago
bangalore, India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – AWS Experience: 8+ years Location: Chennai / Mumbai Work Mode: Hybrid Key Skills: AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experience and a solid background in DevOps, automation, observability, and...
-
Site Reliability Engineer
1 day ago
bangalore, India Enterprise Minds, Inc Full timeSenior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call) We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP) . If you thrive in fast-paced environments, excel in incident management, and...
-
Site Reliability Engineer
23 hours ago
bangalore, India Enterprise Minds, Inc Full timeSenior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call) We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP) . If you thrive in fast-paced environments, excel in incident management, and...
-
Site Reliability Engineer
24 hours ago
bangalore, India Insight Global Full timeCompany: Insight GlobalDuration: Approved for 1 year📍 Location: Remote (India)💼 Type: Contract with Insight Global Client💰 Compensation: 14 LPA – 20 LPA🕒 Working Hours: Normal IST hours🚀 Start Date: Immediate (No notice period)About the RoleJoin our Site Reliability Engineering (SRE) team as a Fullstack Developer, focused on building and...
-
Site Reliability Engineer
2 weeks ago
bangalore, India Andor Tech Full timeHiring!!🏢 About AndorTechAndorTech is a global IT services and consulting firm founded in 2009, headquartered in Bangalore. The company specializes in software engineering, AI-enabled IT services, application support, analytics, and test automation. With a presence across India, the USA, Europe, and the UAE, AndorTech partners with Global Capability...