Site Reliability Engineer

1 month ago


Mumbai, India antal international network Full time

Title : Site Reliability Engineer

My client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space.

Roles & Responsibility :

What will you do?

- Run the production environment by monitoring availability and taking a holistic view of system health.

- Improve reliability, quality, and time-to-market of our suite of software solutions

- Be the 1st person to report the incident.

- Debug production issues across services and levels of the stack.

- Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it.

- Building automated tools in Python / Java / GoLang / Ruby etc.

- Help Platform and Engineering teams gain visibility into our infrastructure.

- Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services.

- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation.

- Participate in on-call rotation to ensure coverage for planned/unplanned events.

- Perform other task like load-test & generating system health reports.

- Periodically check for all dashboards readiness.

- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.

- Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts.

- Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments

- Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA.

- Improving the scalability and reliability of our systems in production.

- Evaluating, designing and implementing new system architectures.

Some specific Requirements :

- B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience

- At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.

- Experience with cloud platforms like - AWS, GCP.

- Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas)

- Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)

- Comfortable with Python, Go, or any relevant programming language.

- Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc.

- Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef.


- Experience with configuration management systems such as Ansible / Chef / Puppet.

- Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.

- Work your way around Unix shell.

- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS

- A focus on delivering high-quality code through strong testing practices.

(ref:hirist.tech)

  • Mumbai, India dentsu Full time

    The purpose of this role is to ensure the availability and stability of production and test platforms. Job Title: Site Reliability Engineer Job Description: Key responsibilities:Troubleshoots and owns issues in our development, test and production environments. Including performance optimisation and continuous tuningWorks alongside the DevOps team in...


  • Mumbai, India Talent Socio Full time

    Job Description :- Lead and mentor a team of Site Reliability Engineers (SREs) responsible for ensuring the reliability, availability, and performance of critical systems.- Establish and enforce engineering practices focused on automation, monitoring, and process improvement to enhance system reliability and operational efficiency.- Conduct thorough and...


  • Mumbai, India IMC Full time

      As a Site Reliability Engineer at IMC, you'll be an integral member of a highly experienced team, responsible for maintaining a robust, best in class, low latency trading environment. The skills necessary to excel could range from system administration, network troubleshooting, database optimization, software development, release management and...


  • Mumbai, India CimpressVista Full time

    Senior Site Reliability Engineer You have successfully completed a degree in computer science or comparable training (e.g. as an ITspecialist) or have gained several years of relevant professional experience in the DevOpsenvironment.Experience working with:Agile methods and cloud technologies/architecture in AWS.Database administration to a small extent...


  • mumbai, India RELX India (Pvt) Ltd Risk div Company Full time

    About the role We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to manage and optimize our AWS cloud resources. The ideal candidate will have a strong background in AWS, Terraform, Kubernetes, and scripting, with proficiency in monitoring and CI/CD tools. Experience with Hashicorp Vault is a plus. Responsibilities: ...


  • Mumbai, India Jio Full time

    Site Reliability Engineer (SRE) with Automation Job OverviewAs a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture, people,...


  • mumbai, India Jio Full time

    Site Reliability Engineer (SRE) with Automation Job Overview As a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture,...


  • Mumbai, India Ztek Consulting INC Full time

    Job Title: Senior Site Reliability Engineer(SRE) Duration: 612 months Location: HybridFort Worth TX Work Type: Rate: Pay rangeoffered to a successful candidate will be based on several factorsincluding the candidates education work experience work locationspecific job duties certifications etc. JobSummary: A Site Reliability Engineer is responsible...


  • mumbai, India Antal International Full time

    Job Description A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech...


  • Mumbai, India Cyber Sphere LLC Full time

    Site Reliability Engineer (SRE) to join our team. Qualifications :- 4+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience requiredResponsibilities :- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services- Monitor and optimize the...


  • Mumbai, India IDFC FIRST Bank Full time

    Role/ Job Title:  Senior Site Reliability Engineering Manager Function/ Department:  Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...


  • mumbai, India IDFC FIRST Bank Full time

    Role/ Job Title:  Senior Site Reliability Engineering Manager Function/ Department:  Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...


  • Mumbai, India Cyber Sphere LLC Full time

    SALARY : 40LPA - 60LPAWe are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and...


  • Mumbai, India Session AI Full time

    Are you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers.Job...


  • mumbai, India Session AI Full time

    Are you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers. Job...


  • mumbai, India RELX India (Pvt) Ltd Risk div Company Full time

    Job Description for Senior Site Reliability Engineer (SRE) Position Overview: We are seeking a dynamic Site Reliability Engineer (SRE) with 7-9 years of experience in system administration who has a deep proficiency in automation. The ideal candidate will be instrumental in monitoring and incident response and will possess comprehensive knowledge...


  • Navi Mumbai, India Capabiliq IT Services (OPC) Private Limited Full time

    Responsibilities :- Define processes for the DevOps program and align to best practice standards- Support of Product delivery teams integrating into existing pipelines and platforms.- Plan for and manage operational resilience for network and application while minimizing the effect on the business- Develop and extend DevOps tooling and automation efforts...

  • Site Engineer

    21 hours ago


    mumbai, India Zodiac HR Full time

    Dear Candidate, Greetings !!! Position - Senior / Site Engineer Location - Borivali Qualification - BE / Btech Expereince - 12+ Years Job Summary: We are seeking an experienced Site Engineer to manage and oversee construction projects, ensuring that all operations are conducted efficiently and to the highest quality standards. The...


  • Mumbai, India Awign Expert Full time

    About Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing administrative tasks,...


  • Mumbai, India Awign Expert Full time

    Job DescriptionAbout Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing...