Site Reliability Engineer
3 days ago
A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech domains.
What will you do?
Run the production environment by monitoring availability and taking a holistic view of system health. Improve reliability, quality, and time-to-market of our suite of software solutions Be the 1st person to report the incident. Debug production issues across services and levels of the stack. Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it. Building automated tools in Python / Java / GoLang / Ruby etc. Help Platform and Engineering teams gain visibility into our infrastructure. Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services. Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation. Participate in on-call rotation to ensure coverage for planned/unplanned events. Perform other task like load-test & generating system health reports. Periodically check for all dashboards readiness. Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results. Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts. Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA. Improving the scalability and reliability of our systems in production. Evaluating, designing and implementing new system architectures.
Some specific Requirements:
B.E./. in Engineering, Computer Science, technical degree, or equivalent work experience At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus. Experience with cloud platforms like - AWS, GCP. Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas) Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP) Comfortable with Python, Go, or any relevant programming language. Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc. Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef. Experience with configuration management systems such as Ansible / Chef / Puppet. Knowledge of load testing methodologies, tools like Gating, Apache Jmeter. Work your way around Unix shell. Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS A focus on delivering high-quality code through strong testing practices.
What do we offer?
Growth
Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets and brilliant people to grow even further. We teach, groom and nurture our people to become leaders. You get to grow with a company that is growing exponentially.
Flex University
We help you upskill by organising in-house courses on important subjects
Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.
Culture
Community and Team building activities
Host weekly, quarterly and annual events/parties.
Wellness
Mediclaim policy for you + parents + spouse + kids
Experienced therapist for better mental health, improve productivity & work-life balance
We work 5 days from the office and we make sure people have everything they need:-
Free meals
Snacks, goodies & a lot of fun culture
Check Your Resume for Match
Upload your resume and our tool will compare it to the requirements for this job like recruiters do.
-
Site Reliability Engineer
4 months ago
Mumbai, India dentsu Full timeThe purpose of this role is to ensure the availability and stability of production and test platforms. Job Title: Site Reliability Engineer Job Description: Key responsibilities:Troubleshoots and owns issues in our development, test and production environments. Including performance optimisation and continuous tuningWorks alongside the DevOps team in...
-
Site Reliability Engineering Manager
2 months ago
Mumbai, India Talent Socio Full timeJob Description :- Lead and mentor a team of Site Reliability Engineers (SREs) responsible for ensuring the reliability, availability, and performance of critical systems.- Establish and enforce engineering practices focused on automation, monitoring, and process improvement to enhance system reliability and operational efficiency.- Conduct thorough and...
-
Site Reliability Engineer
4 months ago
Mumbai, India IMC Full timeAs a Site Reliability Engineer at IMC, you'll be an integral member of a highly experienced team, responsible for maintaining a robust, best in class, low latency trading environment. The skills necessary to excel could range from system administration, network troubleshooting, database optimization, software development, release management and...
-
Senior Site Reliability Engineer
2 months ago
Mumbai, India CimpressVista Full timeSenior Site Reliability Engineer You have successfully completed a degree in computer science or comparable training (e.g. as an ITspecialist) or have gained several years of relevant professional experience in the DevOpsenvironment.Experience working with:Agile methods and cloud technologies/architecture in AWS.Database administration to a small extent...
-
Senior Site Reliability Engineer I
3 days ago
mumbai, India RELX India (Pvt) Ltd Risk div Company Full timeAbout the role We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to manage and optimize our AWS cloud resources. The ideal candidate will have a strong background in AWS, Terraform, Kubernetes, and scripting, with proficiency in monitoring and CI/CD tools. Experience with Hashicorp Vault is a plus. Responsibilities: ...
-
Site Reliability Engineer
1 day ago
Mumbai, India Jio Full timeSite Reliability Engineer (SRE) with Automation Job OverviewAs a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture, people,...
-
Site Reliability Engineer
2 days ago
mumbai, India Jio Full timeSite Reliability Engineer (SRE) with Automation Job Overview As a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture,...
-
Senior Site Reliability Engineer SRE
2 months ago
Mumbai, India Ztek Consulting INC Full timeJob Title: Senior Site Reliability Engineer(SRE) Duration: 612 months Location: HybridFort Worth TX Work Type: Rate: Pay rangeoffered to a successful candidate will be based on several factorsincluding the candidates education work experience work locationspecific job duties certifications etc. JobSummary: A Site Reliability Engineer is responsible...
-
Site Reliability Engineer
1 week ago
Mumbai, India Cyber Sphere LLC Full timeSite Reliability Engineer (SRE) to join our team. Qualifications :- 4+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience requiredResponsibilities :- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services- Monitor and optimize the...
-
Senior Site Reliability Engineering Manager
2 months ago
Mumbai, India IDFC FIRST Bank Full timeRole/ Job Title: Senior Site Reliability Engineering Manager Function/ Department: Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...
-
Senior Site Reliability Engineering Manager
3 days ago
mumbai, India IDFC FIRST Bank Full timeRole/ Job Title: Senior Site Reliability Engineering Manager Function/ Department: Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...
-
Site Reliability Engineer
1 month ago
Mumbai, India Cyber Sphere LLC Full timeSALARY : 40LPA - 60LPAWe are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and...
-
Site Reliability Engineer II
4 months ago
Mumbai, India Session AI Full timeAre you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers.Job...
-
Site Reliability Engineer II
4 months ago
mumbai, India Session AI Full timeAre you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers. Job...
-
Senior Site Reliability Engineer I
3 days ago
mumbai, India RELX India (Pvt) Ltd Risk div Company Full timeJob Description for Senior Site Reliability Engineer (SRE) Position Overview: We are seeking a dynamic Site Reliability Engineer (SRE) with 7-9 years of experience in system administration who has a deep proficiency in automation. The ideal candidate will be instrumental in monitoring and incident response and will possess comprehensive knowledge...
-
Senior DevOps Engineer
3 days ago
Navi Mumbai, India Capabiliq IT Services (OPC) Private Limited Full timeResponsibilities :- Define processes for the DevOps program and align to best practice standards- Support of Product delivery teams integrating into existing pipelines and platforms.- Plan for and manage operational resilience for network and application while minimizing the effect on the business- Develop and extend DevOps tooling and automation efforts...
-
Site Reliability Engineer
4 weeks ago
Mumbai, India Awign Expert Full timeAbout Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing administrative tasks,...
-
Site Reliability Engineer
1 month ago
Mumbai, India antal international network Full timeTitle : Site Reliability EngineerMy client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space. Roles & Responsibility : What will you do?- Run the production environment by monitoring availability and taking a holistic...
-
Site Engineer
18 hours ago
mumbai, India Zodiac HR Full timeDear Candidate, Greetings !!! Position - Senior / Site Engineer Location - Borivali Qualification - BE / Btech Expereince - 12+ Years Job Summary: We are seeking an experienced Site Engineer to manage and oversee construction projects, ensuring that all operations are conducted efficiently and to the highest quality standards. The...
-
Site Reliability Engineer
4 weeks ago
Mumbai, India Awign Expert Full timeJob DescriptionAbout Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing...