Site Reliability Engineering Manager

4 weeks ago


gurugram, India Intellect Design Arena Ltd Full time

Position: Site Reliability Lead - Release Track

Experience: 8+ Years

Location: Gurgaon

Notice Period: Early joiners will be preferred

ABOUT INTELLECT DESIGN ARENA LTD

Intellect Design Arena Ltd. hosts the world’s largest cloud-native, API-led microservices-based multi-product platform for Global leaders in Banking, Insurance, and Capital Markets. With over three decades of deep domain expertise, Intellect is the brand that progressive financial institutions rely on for digital transformation initiatives. It offers enterprise-grade, composable and contextual financial technology products and platforms on the cloud through its three lines of businesses - Intellect Global Transaction Banking – iGTB, Intellect Global Consumer Banking – iGCB and IntellectAI.

OUR ACHIEVEMENTS:

Intellect is the chosen partner for the Top 3 large banks in 6 countries - India, Canada, UK, France, UAE, and Saudi Arabia. It serves over 270 customers across 57 countries and with a diverse workforce of solution architects, domain and technology experts in major global financial hubs worldwide.

Intellect pioneered Design Thinking to create cutting-edge products and solutions for banking and insurance, with design being the key differentiator in enabling digital transformation. FT 8012, the world’s first design center for financial technology, celebrated its 10th year anniversary recently, reflecting Intellect’s commitment to continuous and impactful innovation, addressing the growing need for digital transformation.

Explore more about us at:

WHAT YOU WILL DO:

YOUR AREA OF KNOWLEDGE AND EXPERTISE:

Job Summary:

As the Site Reliability Lead in the release track, you will be responsible for ensuring the stability, reliability, and scalability of our software releases. You will lead a team of engineers focused on implementing and maintaining release processes, monitoring systems, and resolving production incidents efficiently. Your role will involve collaborating with development, operations, and quality assurance teams to improve release pipelines, automate deployment processes, and enhance system performance. The ideal candidate will have a strong background in release management, excellent leadership skills, and a proven track record of delivering high-quality releases on time.

Key Responsibilities:

  • Lead a team of Site Reliability Engineers (SREs) focused on ensuring the reliability and scalability of software releases.
  • Develop and maintain release processes, including version control, build automation, and deployment strategies.
  • Implement and manage monitoring and alerting systems to proactively identify and address production issues.
  • Collaborate with development teams to streamline release pipelines and integrate automated testing into the deployment process.
  • Work closely with operations teams to optimize infrastructure and resource utilization for deployment environments.
  • Define and enforce service level objectives (SLOs) and service level agreements (SLAs) to maintain system reliability and availability.
  • Conduct post-incident reviews and root cause analysis to identify areas for improvement and prevent future incidents.
  • Implement best practices for reliability engineering and release management.
  • Stay current with industry trends and emerging technologies in site reliability engineering and release management.
  • Coordinate with cross-functional teams including development, QA, operations, and project management to ensure smooth and efficient release cycles
  • Plan, schedule & perform, release activities, including release planning, deployment, and post-release support
  • Identify & implement automation and efficiency improvements in release processes through tools and technology
  • Conduct risk assessments for each release to identify potential security vulnerabilities and mitigate them prior to deployment.
  • Conduct security testing activities, including vulnerability scanning, penetration testing, and code reviews, to identify and address security issues early in the release process.
  • Implement and enforce secure configuration management practices for all release artifacts, including code, configurations, and dependencies
  • Implement access controls for release environments, ensuring that only authorized individuals have access to sensitive systems and data.
  • Continuously monitor and evaluate the security posture of release processes and environments, and identify opportunities for improvement to enhance security posture and reduce risk.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field; Master's degree preferred.
  • 8+ years of experience in site reliability engineering, release management, or a related field.
  • 3+ years of experience as a lead
  • Proven experience in SRE, implementing DevOps (CI/CD) processes & leading teams / managing complex projects in a fast-paced environment.
  • Strong knowledge of software development lifecycle (SDLC) and release management processes.
  • Hands-on experience with automation tools such as Jenkins, GitLab CI/CD, Linux, Unix or similar.
  • Proficient in scripting and programming languages such as Python, Bash, or Go.
  • Deep understanding / experience of AWS services – EC2, Auto Scaling, ELB, Amazon S3, AWS Lambda, Amazon CloudWatch, AWS CloudFormation, Amazon ECS/EKS, AWS Identity and Access Management (IAM), AWS CloudTrail, AWS Networking, AWS Well-Architected Framework, AWS Cost Management
  • Excellent problem-solving skills and ability to analyze complex systems to identify areas for improvement.
  • Strong communication and collaboration skills, with the ability to work effectively across teams and departments.
  • Relevant certifications such as AWS Certified DevOps Engineer, Google Professional CloudDevOps Engineer, or similar, are a plus.

Preferred Skills:

  • Experience with infrastructure as code (IaC) tools such as Terraform, CloudFormation, or Ansible.
  • Knowledge of continuous integration and continuous delivery (CI/CD) principles and best practices.
  • Familiarity with incident management tools such as PagerDuty, VictorOps, or similar.
  • Experience with performance tuning and optimization of distributed systems.

WHAT INTELLECT OFFERS YOU:

FT 8012, World's First FinTech Design Centre for Financial Institutions, We have a rich and truly diverse work environment that is bustling with creative energy and individual perspectives from 29 nationalities and 30 languages.

LIVE YOUR DREAM - Intellect is India's most profitable unicorn. A pioneer in Design Thinking, it has helped shape the future of FinTech with passion, and cutting-edge products, platforms and exponential technologies.

Imagination

  • Explore new possibilities at the epicentre of Design Thinking and cutting-edge technology.
  • Unleash your true potential with mentor-led growth and development.

Learning

  • Regular training sessions to develop personality traits.
  • Full support on career and skills development to enhance your expertise to maximise your career aspirations.

Execution Excellence

  • Get an opportunity to work with the world’s strongest FinTech leaders who designed and created complex world class products.
  • Be part of our dynamic team, to create world-class products for global marquee clients.
  • A clear team vision with future ready FinTech Platforms.

Collaboration

  • A diverse and inclusive community of belonging, where teammates are empowered to bring ideas to the table and act.

Influencing

We are agenda setters in the market by delivering composable, contextual and hyper scalable FinTech solutions.

Good Luck

Anand Soni

Intellect Design Arena



  • Gurugram, India Airtel Digital Full time

    Site Reliability Engineer is one of the critical role in the technology team and the person working in this team will be responsible for application performance, availability, reliability and system uptime. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues....


  • Gurugram, India Airtel Digital Full time

    Site Reliability Engineer is one of the critical role in the technology team and the person working in this team will be responsible for application performance, availability, reliability and system uptime. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues....


  • Gurugram, India Airtel Digital Full time

    Site Reliability Engineer is one of the critical role in the technology team and the person working in this team will be responsible for application performance, availability, reliability and system uptime. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues....


  • gurugram, India StatusNeo Full time

    Job Description: We are seeking a highly skilled and experienced Senior Site Reliability Engineer with expertise in Core Tools and DevOps to join our dynamic team. The ideal candidate will have a strong background in Linux administration, cloud infrastructure, Infrastructure as Code (IaC), Python programming, and be a subject matter expert in DevOps tools...


  • gurugram, India GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other...


  • Gurugram, India GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering...


  • Gurugram, India StatusNeo Full time

    Job Description: We are seeking a highly skilled and experienced Senior Site Reliability Engineer with expertise in Core Tools and DevOps to join our dynamic team. The ideal candidate will have a strong background in Linux administration, cloud infrastructure, Infrastructure as Code (IaC), Python programming, and be a subject matter expert in DevOps tools...


  • Gurugram, India Codersbrain technology pvt ltd Full time

    Key Responsibilities :- Provide expert production support for application teams utilizing our platform, ensuring high availability, reliability, and performance.- Diagnose and resolve complex issues in production environments, collaborating closely with development teams and stakeholders.- Implement and maintain monitoring, alerting, and logging solutions to...

  • Senior SRE

    4 weeks ago


    Gurugram, India Epam Full time

    Description EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that...


  • gurugram, India Epam Full time

    Description EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects...

  • Senior SRE

    1 week ago


    gurugram, India Epam Full time

    Description EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects...

  • Principle Engineer

    4 weeks ago


    Gurgaon,Gurugram, India SAR HR Consultancy Full time

    Principle Engineer - SRE What You Need for this Position:- 10+ years of hands-on technical experience within the realm of Site Reliability Engineering- Architect-level understanding of one or more of the major public cloud services (AWS, GCP & Azure), using them to effectively design secure and scalable services.- Strong understanding of SRE concepts and...


  • Bangalore/Gurgaon/Gurugram, IN Codersbrain technology pvt ltd Full time

    Key Responsibilities :- Provide expert production support for application teams utilizing our platform, ensuring high availability, reliability, and performance.- Diagnose and resolve complex issues in production environments, collaborating closely with development teams and stakeholders.- Implement and maintain monitoring, alerting, and logging solutions to...


  • Gurgaon/Gurugram, India E-Qube Digital Services Full time

    Job Description : - 5 - 7 years' experience in cloud infrastructure engineering roles- 1-3 years' experience as Site Reliability Engineer or similar role, in a global organization.- Bachelor's degree in computer science, information systems or other related field (or equivalent work experience) - Customer service: experience working with...


  • Gurugram, India Acefone Full time

    Key Responsibilities:1. Telephony Infrastructure Management:Design, implement, and maintain internet telephony systems to ensure high availability and call quality.Manage and optimize cloud telephony services to scale with our growing user base.Troubleshoot and resolve telephony-related issues to minimize downtime and disruptions. 2. Cloud Expertise:Utilize...


  • gurugram, India Acefone Full time

    Key Responsibilities: 1. Telephony Infrastructure Management: Design, implement, and maintain internet telephony systems to ensure high availability and call quality. Manage and optimize cloud telephony services to scale with our growing user base. Troubleshoot and resolve telephony-related issues to minimize downtime and disruptions. 2. Cloud Expertise:...


  • Gurugram, India Acefone Full time

    Key Responsibilities:1. Telephony Infrastructure Management:Design, implement, and maintain internet telephony systems to ensure high availability and call quality.Manage and optimize cloud telephony services to scale with our growing user base.Troubleshoot and resolve telephony-related issues to minimize downtime and disruptions. 2. Cloud Expertise:Utilize...


  • gurugram, India Cvent Full time

    Overview: Founded in 1999, Cvent has become the global leader in meetings, event, travel, and hospitality technology, with more than 4000+ employees worldwide. As a leading cloud-based technology company, we have over 28,000+ customers, including 80% of the Fortune 100 companies, in more than 100 countries. Cvent’s software solutions optimize the entire...


  • gurugram, India Citadel Securities Full time

    Job Description Responsibilities: Candidates who have less than 3 years of experience should possess: Good knowledge of UNIX/Linux command line. Good understanding of the usage of TCP/IP and UDP networking in applications. Basic understanding of network routing and troubleshooting. Basic experience in writing SQL database queries. Basic...


  • Gurugram, India Citadel Securities Full time

    Job Description Responsibilities: Candidates who have less than 3 years of experience should possess: Good knowledge of UNIX/Linux command line. Good understanding of the usage of TCP/IP and UDP networking in applications. Basic understanding of network routing and troubleshooting. Basic experience in writing SQL database queries. Basic...