Site Reliability Engineer

2 weeks ago


Mumbai, Maharashtra, India Antal International Full time
Job Description

My client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space.

Title Site Reliability Engineer

Roles & Responsibility :

What will you do?

- Run the production environment by monitoring availability and taking a holistic view of system health.

- Improve reliability, quality, and time-to-market of our suite of software solutions

- Be the 1st person to report the incident.

- Debug production issues across services and levels of the stack.

- Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it.

- Building automated tools in Python / Java / GoLang / Ruby etc.

- Help Platform and Engineering teams gain visibility into our infrastructure.

- Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services.

- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation.

- Participate in on-call rotation to ensure coverage for planned/unplanned events.

- Perform other task like load-test & generating system health reports.

- Periodically check for all dashboards readiness.

- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.

- Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts.

- Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments

- Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA.

- Improving the scalability and reliability of our systems in production.

- Evaluating, designing and implementing new system architectures.

Some specific Requirements :

- B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience

- At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.

- Experience with cloud platforms like - AWS, GCP.

- Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas)

- Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)

- Comfortable with Python, Go, or any relevant programming language.

- Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc.

- Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef.

- Experience with configuration management systems such as Ansible / Chef / Puppet.

- Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.

- Work your way around Unix shell.

- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS

- A focus on delivering high-quality code through strong testing practices.

Check Your Resume for Match

Upload your resume and our tool will compare it to the requirements for this job like recruiters do.



  • Mumbai, Maharashtra, India myGwork Full time

    This job is with Morningstar, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Site Reliability Engineer - Observability The Team: Observability team plays a critical role in modern software development and engineering area. It is an integral part of...


  • Mumbai, Maharashtra, India Ascendion Full time

    Job Description :We are looking for an experienced Azure Site Reliability Engineer (SRE) with 6-9 years of experience to support and administer Azure Kubernetes Service (AKS) clusters running critical middleware handling thousands of transactions per second (TPS). The ideal candidate will have a strong background in Infrastructure as Code (IaC), cloud...


  • Mumbai, Maharashtra, India myGwork Full time

    This job is with Morningstar, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. The Role:As the Lead Site Reliability Engineer, you will take charge of developing and implementing strategies that ensure the reliability, scalability, and performance of...


  • Mumbai, Maharashtra, India Accolite Full time

    Accolite DigitalBounteous, a digital innovation partner of the world's most ambitious brands and Accolite Digital, a leading digital engineering, cloud, data & AI services provider, have announced their merger, creating a new end-to-end digital transformation services consultancy that partners with leading brands around the globe to co-innovate and drive...


  • Mumbai, Maharashtra, India HARP Technologies and Services Full time

    Experience : 8+ YearsLocation : Mumbai,Chennai (Other cities Remote)Notice period : Immediate to 30 days max Responsibilities of Senior SRE : - The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.- They work with cross-functional teams to design, build and maintain...


  • Mumbai, Maharashtra, India Accolite Full time

    Accolite Digital Bounteous, a digital innovation partner of the world's most ambitious brands and Accolite Digital, a leading digital engineering, cloud, data & AI services provider, have announced their merger, creating a new end-to-end digital transformation services consultancy that partners with leading brands around the globe to co-innovate and drive...


  • Mumbai, Maharashtra, India FIS Full time

    Job DescriptionPosition TypeFull timeType Of HireExperienced (relevant combo of work and education)Education DesiredBachelor&aposs DegreeTravel Percentage0%ite Reliability Engineer (SRE, Python, Linux, Shell scripting) - 5 to 10Yrs Mumbai/Pune/Bangalore (Rotational Shift)Are you curious, motivated, and forward-thinking At FIS youll have the opportunity to...


  • Mumbai, Maharashtra, India Arting Digital Full time

    Job Title - Site Reliability EngineerExperience- 6+ YearsLocation - Faridabad/MumbaiPrimary Skills - AWS, Azure, Apache/linux/Nginx/Nodejs, SQLEducational background - Any Computer/Engineering degreeRoles & Responsibilities :- Responsible for designing & implementing infrastructure for delivering and running web and mobile applications.- Responsible for high...


  • Mumbai, Maharashtra, India myGwork Full time

    This job is with Entain, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Ivy is a global, cutting-edge software and support services provider, partnering with one of the world's biggest online gaming and entertainment groups. Founded in 2001, we've...


  • Mumbai, Maharashtra, India LEXISNEXIS Full time

    About the BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk Within our Government vertical our solutions assist government agencies and law enforcement to drive insights from complex data sets improving operation efficiency increasing program integrity discovering and recovering revenue and making timely and informed...


  • Mumbai, Maharashtra, India myGwork Full time

    This job is with Entain, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Ivy is a global, cutting-edge software and support services provider, partnering with one of the world's biggest online...


  • Mumbai, Maharashtra, India Hashone Careers Private Limited Full time

    Experience Required : 5+ yearsLocation : First 2 months in Hyderabad, then Mumbai (6 months contract - extendable)Work Schedule : 24x7 support with 8 days off per month. Will follow a rotational roster scheduleKey Responsibilities :- Provide 24x7 operational support to ensure the availability and reliability of critical systems and applications.- Monitor,...


  • Mumbai, Maharashtra, India Antal International Full time

    Job Description Run the production environment by monitoring availability and taking a holistic view of system health.  Improve reliability, quality, and time-to-market of our suite of software solutions  Be the 1st person to report the incident.  Debug production issues across services and levels of the stack.  Envisioning the overall solution...


  • Mumbai, Maharashtra, India Neemtree Full time

    Responsibilities : Design, deploy, and manage AWS cloud infrastructure, ensuring scalability, security, and reliability.Ensure system reliability and performance through proactive monitoring and capacity planning.Monitor actual costs against budgeted figures and analyze variances.Implement cost control measures to minimize financial risks.Configure,...


  • Mumbai, Maharashtra, India IDFC FIRST Bank Full time

    Role/ Job Title:  Senior Site Reliability Engineering Manager Function/ Department:  Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...


  • Mumbai, Maharashtra, India Antal International Full time

    Job Description A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech...


  • Mumbai, Maharashtra, India Session AI Full time

    Are you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers.Job...


  • Mumbai, Maharashtra, India Accolite Full time

    As a Site Reliability Engineer at this digital transformation consultancy, you will play a critical role in ensuring the reliability and scalability of complex systems.The company has a rich history of co-innovating with ambitious brands to create transformative digital experiences.This role requires a strong technical background, including 10+ years of...


  • Mumbai, Maharashtra, India IMC Trading Full time

    About Our Company:At IMC, we're a collaborative and high-performance culture united by our commitment to innovation and excellence.We're always pushing the boundaries of what's possible, embracing disruptive technologies, and developing an innovative research environment to stay ahead of the curve.Our Expectations:You'll be working closely with our talented...

  • Site Engineer

    3 weeks ago


    Mumbai, Maharashtra, India Disha Skill Training Services Full time

    Job DescriptionDear candidates,We have urgent opening for Site engineer for reputed construction company.Head office-Andheri east (near airport road)Project held in -Goa,Khalapur,Vikroli locationJob location-Project-Vikroli locationJob profile-Site engineer3 yr into site engineer for commercial project into repairing & maintenance workSalary-Upto 30,0009am...