Senior Site Reliability Engineer

3 weeks ago


Gurgaon, Haryana, India Majid Al Futtaim Full time
Site Reliability Engineer ( SRE III)About us ::Majid Al Futtaim is an Emirati-owned, diversified lifestyle conglomerate operating across the Middle East, Africa and Asia. The Group started from one man's vision to transform the face of shopping, entertainment, and leisure to 'Create Great Moments For Everyone, Everyday'.Founded in 1992, we're pioneers in shopping malls, communities, retail, and leisure across 15 international markets.We operate 25 shopping malls, 13 hotels, and 4 mixed-use communities, including icons like Mall of the Emirates and City Centre Malls.Carrefour? Yep, that's us We brought Carrefour to the region in 1995 and now run 375+ Carrefour stores across 17 countries, serving 750,000+ customers daily.But that's just the beginning. We're leading the charge in digital innovation, with a strong focus on e-commerce and personalized customer experiences. Here are some of our cool projects:Scan & Go, Carrefour NOW, and even Tally the Robot—the first of its kind in the Middle EastWe're also driving sustainability and a customer-first culture with cutting-edge digital solutions.Why should you join us?We're a family of 250+ in India, and we're growing fast. With us, you'll experience:
• Infinite tech exposure & mentorship
• Live case problem-solving with real impact
• Hackdays and continuous learning through tech talks
• Fun, collaborative work environment that's more sincere than seriousKey Responsibilities:Cloud Infrastructure Management : Manage, deploy, and monitor highly scalable and resilient infrastructure using Microsoft Azure .Containerization & Orchestration : Design, implement, and maintain Docker containers and Kubernetes clusters for microservices and large-scale applications.Automation : Automate infrastructure provisioning, scaling, and management using Terraform and other Infrastructure-as-Code (IaC) tools.CI/CD Pipeline Management : Build and maintain CI/CD pipelines for continuous integration and delivery of applications and services, ensuring high reliability and performance.Monitoring & Incident Management : Implement and manage monitoring, logging, and alerting systems to ensure system health, identify issues proactively, and lead incident response for operational challenges.Kafka & Apigee Management : Manage and scale Apache Kafka clusters for real-time data streaming and Apigee for API management.Scripting & Automation : Utilize scripting languages (e.g., Python , Bash , Go etc.) to automate repetitive tasks, enhance workflows, and optimize infrastructure management.Collaboration : Work closely with development teams to improve application architecture for high availability, low latency, and scalability.Capacity Planning & Scaling : Conduct performance tuning and capacity planning for cloud and on-premises infrastructure.Security & Compliance : Ensure security best practices and compliance requirements are met in the design and implementation of infrastructure and services.Required Skills & Qualifications:Experience : 7+ years of experience as a Site Reliability Engineer (SRE)/Platform Engineering, DevOps Engineer, or similar role in cloud environments.Cloud Expertise : Strong hands-on experience with Microsoft Azure services, including compute, storage, networking, and security services.Containerization & Orchestration : Proficiency in managing and deploying Docker containers and orchestrating them with Kubernetes .Infrastructure as Code (IaC) : Deep knowledge of Terraform for provisioning and managing infrastructure.CI/CD : Experience building, maintaining, and optimizing CI/CD pipelines using tools like Jenkins , GitLab CI , Azure DevOps , or others.Message Brokers : Hands-on experience with Kafka for distributed streaming and messaging services like ServiceBus/EventHub. – Good to have exposure with KafkaAPI Management : Familiarity with Apigee or similar API management tools. – Good to have exposure with ApigeeScripting & Automation : Expertise in scripting with languages such as Python , Bash , Go or similar.Monitoring & Logging : Experience with monitoring tools like Newrelic, Prometheus , Grafana , Azure Monitor , and logging solutions such as ELK stack (Elasticsearch, Logstash, Kibana).Version Control : Strong experience using Git, Bitbucket, Github for source control.Problem-Solving : Excellent troubleshooting skills and the ability to resolve complex infrastructure and application issues.Collaboration & Communication : Ability to work in a collaborative, cross-functional environment and communicate complex technical issues effectively to both technical and non-technical teams.Preferred Skills:Experience with additional cloud providers such as AWS or Google Cloud .Familiarity with other message brokers such as RabbitMQ or ActiveMQ .Experience with Apigee Edge for managing APIs and microservices.Knowledge of networking concepts and technologies, such as load balancing, DNS, and VPNs.

  • Gurgaon, Haryana, India beBee Careers Full time

    A key member of the team, this Senior Site Reliability Engineer will focus on incident management and troubleshooting, developing and improving monitoring, alerting, and diagnostic tools, and conducting blameless postmortems.The SRE Specialist will also be responsible for automation and infrastructure as code, managing infrastructure using tools like...


  • Gurgaon, Haryana, India Cloudologic Full time

    Company Description : Cloudologic is a prominent cloud consulting and IT service provider based in Singapore and rooted in India, focusing on cloud operations, cyber security, and managed services. With a decade of expertise, our dedication to delivering high-quality services has earned the trust of clients worldwide, making us a valued partner in the tech...


  • Gurgaon, Haryana, India Cloudologic Full time

    Company Description : Cloudologic is a prominent cloud consulting and IT service provider based in Singapore and rooted in India, focusing on cloud operations, cyber security, and managed services. With a decade of expertise, our dedication to delivering high-quality services has earned the trust of clients worldwide, making us a valued partner in the tech...


  • Gurgaon, Haryana, India Crescendo Full time

    About Crescendo GlobalCrescendo Global is a niche recruitment agency specializing in senior to C-level placements. We pride ourselves on delivering a memorable job search and leadership hiring experience for both job seekers and employers. Job Summary: Senior Technical LeadWe are seeking a highly skilled Senior Technical Lead to join our team. This...


  • Gurgaon, Haryana, India UnitedHealth Group Full time

    Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion,...


  • Gurgaon, Haryana, India myGwork Full time

    This job is with Synechron, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Overall Summary:We are seeking a skilled and experienced SRE Engineer to join our team. The ideal candidate will...


  • Gurgaon, Haryana, India MyGwork Full time

    About the Role: We are seeking an experienced IT professional to join our Automotive Insights team as a Principal Site Reliability Engineer. The role will be responsible for setting Operational and Site Reliability Engineering (SRE) standards that our support teams can leverage.The Impact: By joining our team, you will have the opportunity to work closely...


  • Gurgaon, Haryana, India Karix Full time

    Role: Site Reliability Engineer (L2 Support)Location: Gurgaon (WFO)About the role: We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Service platforms.They are responsible for monitoring, automating, and improving the...


  • Gurgaon, Haryana, India UnitedHealth Group Full time

    Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion,...


  • Gurgaon, Haryana, India Cvent Full time

    Job DescriptionJob Descritiption-As a Site Reliability Engineer, you'll use your advanced development and operations knowledge to identify and prioritize issues. Find universal solutions to common problems and mentor and support junior staff.Additionally, you will:Enlighten, Enable and Empower a fast-growing set of multi-disciplinary teams, across multiple...