Site Reliability Engineer
1 month ago
Title : Site Reliability Engineer
My client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space.
Roles & Responsibility :
What will you do?
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Be the 1st person to report the incident.
- Debug production issues across services and levels of the stack.
- Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it.
- Building automated tools in Python / Java / GoLang / Ruby etc.
- Help Platform and Engineering teams gain visibility into our infrastructure.
- Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services.
- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation.
- Participate in on-call rotation to ensure coverage for planned/unplanned events.
- Perform other task like load-test & generating system health reports.
- Periodically check for all dashboards readiness.
- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.
- Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts.
- Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments
- Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA.
- Improving the scalability and reliability of our systems in production.
- Evaluating, designing and implementing new system architectures.
Some specific Requirements :
- B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience
- At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.
- Experience with cloud platforms like - AWS, GCP.
- Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas)
- Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)
- Comfortable with Python, Go, or any relevant programming language.
- Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc.
- Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef.
- Experience with configuration management systems such as Ansible / Chef / Puppet.
- Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.
- Work your way around Unix shell.
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- A focus on delivering high-quality code through strong testing practices.
-
Site Reliability Engineer
4 months ago
Mumbai, India dentsu Full timeThe purpose of this role is to ensure the availability and stability of production and test platforms. Job Title: Site Reliability Engineer Job Description: Key responsibilities:Troubleshoots and owns issues in our development, test and production environments. Including performance optimisation and continuous tuningWorks alongside the DevOps team in...
-
Site Reliability Engineering Manager
2 months ago
Mumbai, India Talent Socio Full timeJob Description :- Lead and mentor a team of Site Reliability Engineers (SREs) responsible for ensuring the reliability, availability, and performance of critical systems.- Establish and enforce engineering practices focused on automation, monitoring, and process improvement to enhance system reliability and operational efficiency.- Conduct thorough and...
-
Site Reliability Engineer
4 months ago
Mumbai, India IMC Full timeAs a Site Reliability Engineer at IMC, you'll be an integral member of a highly experienced team, responsible for maintaining a robust, best in class, low latency trading environment. The skills necessary to excel could range from system administration, network troubleshooting, database optimization, software development, release management and...
-
Senior Site Reliability Engineer
2 months ago
Mumbai, India CimpressVista Full timeSenior Site Reliability Engineer You have successfully completed a degree in computer science or comparable training (e.g. as an ITspecialist) or have gained several years of relevant professional experience in the DevOpsenvironment.Experience working with:Agile methods and cloud technologies/architecture in AWS.Database administration to a small extent...
-
Senior Site Reliability Engineer I
3 days ago
mumbai, India RELX India (Pvt) Ltd Risk div Company Full timeAbout the role We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to manage and optimize our AWS cloud resources. The ideal candidate will have a strong background in AWS, Terraform, Kubernetes, and scripting, with proficiency in monitoring and CI/CD tools. Experience with Hashicorp Vault is a plus. Responsibilities: ...
-
Site Reliability Engineer
2 days ago
Mumbai, India Jio Full timeSite Reliability Engineer (SRE) with Automation Job OverviewAs a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture, people,...
-
Site Reliability Engineer
2 days ago
mumbai, India Jio Full timeSite Reliability Engineer (SRE) with Automation Job Overview As a Site Reliability (SRE)/DevOps Automation Engineer, you will be responsible for the availability, automation, performance, efficiency, Scaling, monitoring and emergency response for any incidents/issues in Applications. You will use your deep understanding of platforms, architecture,...
-
Senior Site Reliability Engineer SRE
2 months ago
Mumbai, India Ztek Consulting INC Full timeJob Title: Senior Site Reliability Engineer(SRE) Duration: 612 months Location: HybridFort Worth TX Work Type: Rate: Pay rangeoffered to a successful candidate will be based on several factorsincluding the candidates education work experience work locationspecific job duties certifications etc. JobSummary: A Site Reliability Engineer is responsible...
-
Site Reliability Engineer
3 days ago
mumbai, India Antal International Full timeJob Description A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech...
-
Site Reliability Engineer
1 week ago
Mumbai, India Cyber Sphere LLC Full timeSite Reliability Engineer (SRE) to join our team. Qualifications :- 4+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience requiredResponsibilities :- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services- Monitor and optimize the...
-
Senior Site Reliability Engineering Manager
2 months ago
Mumbai, India IDFC FIRST Bank Full timeRole/ Job Title: Senior Site Reliability Engineering Manager Function/ Department: Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...
-
Senior Site Reliability Engineering Manager
3 days ago
mumbai, India IDFC FIRST Bank Full timeRole/ Job Title: Senior Site Reliability Engineering Manager Function/ Department: Information Technology Job Purpose: Site Reliability Engineering (SRE) department plays a pivotal role in providing seamless experience for our customers. With state-of-the-art technology and tools, we are transforming the overall application development and...
-
Site Reliability Engineer
1 month ago
Mumbai, India Cyber Sphere LLC Full timeSALARY : 40LPA - 60LPAWe are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and...
-
Site Reliability Engineer II
4 months ago
Mumbai, India Session AI Full timeAre you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers.Job...
-
Site Reliability Engineer II
4 months ago
mumbai, India Session AI Full timeAre you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of Session AI, the pioneer of in-session marketing, is looking to add talented team members to help us grow into the premier revenue tool for e-commerce. We work with some of the leading brands nationwide and we innovate how brands connect with and convert customers. Job...
-
Senior Site Reliability Engineer I
3 days ago
mumbai, India RELX India (Pvt) Ltd Risk div Company Full timeJob Description for Senior Site Reliability Engineer (SRE) Position Overview: We are seeking a dynamic Site Reliability Engineer (SRE) with 7-9 years of experience in system administration who has a deep proficiency in automation. The ideal candidate will be instrumental in monitoring and incident response and will possess comprehensive knowledge...
-
Senior DevOps Engineer
3 days ago
Navi Mumbai, India Capabiliq IT Services (OPC) Private Limited Full timeResponsibilities :- Define processes for the DevOps program and align to best practice standards- Support of Product delivery teams integrating into existing pipelines and platforms.- Plan for and manage operational resilience for network and application while minimizing the effect on the business- Develop and extend DevOps tooling and automation efforts...
-
Site Engineer
21 hours ago
mumbai, India Zodiac HR Full timeDear Candidate, Greetings !!! Position - Senior / Site Engineer Location - Borivali Qualification - BE / Btech Expereince - 12+ Years Job Summary: We are seeking an experienced Site Engineer to manage and oversee construction projects, ensuring that all operations are conducted efficiently and to the highest quality standards. The...
-
Site Reliability Engineer
4 weeks ago
Mumbai, India Awign Expert Full timeAbout Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing administrative tasks,...
-
Site Reliability Engineer
4 weeks ago
Mumbai, India Awign Expert Full timeJob DescriptionAbout Awign Expert: Awign Expert, a division of Awign - India's largest work-as-a-service platform. We connect skilled professionals with exciting project-based opportunities from top companies, handling onboarding, feedback, conflict resolution, and payroll. Our mission is to empower professionals to focus on their work by managing...