Senior Site Reliability Engineer

2 days ago


Gurugram, India Cvent Full time

Site Reliability is about combining development and operations knowledge and skills to help make the organization better. If you have SRE or development background and have experience on improving reliability of your services/products by adding Observability to it – Cvent SRE can benefit from your skillsets. Ultimately, we are looking for passionate people who love learning, love technology and always want to make things better.

As a Senior SRE on the SRE Observability team, you will be responsible for helping Cvent to achieve our reliability goals. We are looking for someone with the drive, ownership and ability to take on challenging problems, both technical and process related, in a dynamic, collaborative and highly distributed, multi-disciplinary team environment. You will use your background as a generalist to work closely with product development teams, Cloud Infrastructure and other SRE teams to ensure the effective observability and improve reliability of our products, SLDC and Infrastructure. You must be able to see the big picture and work collaboratively with teams to solve hard multi-disciplinary problems. Technical expertise in topics such as cloud operations, the software development lifecycle, and Observability tools will be of great help to you. We use SRE principals such as blameless postmortems and a focus on automation to ensure we're constantly improving our knowledge and maintaining a good quality of life. Overall, we're passionate about continuous improvement, learning

and participating in dynamic day to day work where success is rewarded with recognition and upward mobility.

What You Will Be Doing


•Enlighten, Enable and Empower a fast-growing set of multi-disciplinary teams, across multiple applications and locations.


•Tackle complex development, automation and business process problems. Champion Cvent standards and best practices.


•Ensure the scalability, performance, and resilience of Cvent products and processes.


•Work with product development teams, Cloud Automation and other SRE teams to ensure a holistic understanding of observability gaps and their effective and efficient identification and resolution.


•Identify recurring problems and anti-patterns in development, operational and security processes and help respective team to build observability for those.


•Develop build, test and deployment automation that seamlessly targets multiple on-premises and AWS regions.


•Give back by working on and contributing to Open-Source projects.

What You Need for this Position

Must have skills:


•Excellent communication skills and track record working in distributed teams


•A passion for and track record in making things better for your peers.


•Experience managing AWS services / operational knowledge of managing applications in AWS – ideally via automation.


•Fluent in at least one scripting languages like Typescript, Javascript, Python, Ruby and Bash.


•Experience with SDLC methodologies (preferably Agile).


•Experience with Observability (Logging, Metrics, Tracing) and SLI/SLO


•Working with APM, monitoring, and logging tool (Datadog, New Relic, Splunk)


•Good understanding of containerization concepts - docker, ECS, EKS, Kubernetes.


•Self-motivation and the ability to work under minimal supervision


•Troubleshooting and responding to incidents, set a standard for others to prevent the issues in future.

Good to have skills:


•Experience with Infrastructure as Code (IaC) tools such as CloudFormation, CDK (preferred) and Terraform.


•Experience managing 3 tier application stacks.


•Understanding of basic networking concepts.


•Experience on Server configuration through Chef, Puppet, Ansible or equivalent


•Working experience with NoSQL databases such as MongoDB, Couchbase, Postgres etc


•Use APM data to Troubleshooting and finding performance bottleneck



  • Gurugram, India Freecharge Full time

    Job Title: Site Reliability Engineer (SRE)3 Years Experience About the Role: We are looking for a Site Reliability Engineer (SRE) with 3 years of experience to join our team. You will be responsible for ensuring the reliability, scalability, and efficiency of our production systems. This role requires a balance of software engineering, system administration,...


  • Mumbai, Gurugram, Chennai, India Relx Group Full time

    Job DescriptionAbout the RoleThis Senior Site Reliability Engineer (SRE) position offers the opportunity to work on impactful projects that enhance reliability and reduce manual work through automation. You ll leverage your experience across a range of SRE practices, helping to maintain resilient, distributed systems and automate processes to protect...


  • Gurugram, India Gemini Solutions Pvt Ltd Full time

    Position Summary In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and...


  • Gurugram, India Leapwork Full time

    At Leapwork, our vision is to break down the barriers between humans and computers through the world's most accessible automation platform. We are the leading global AI-powered visual test automation solution, enabling some of the world's largest enterprises to adopt, scale, and maintain automation – in under 30 days. In today's environment, where...


  • Bengaluru, Gurugram, India Rackspace Technology Full time US$ 1,25,000 - US$ 1,75,000 per year

    Site Reliability Engineer / Observability Engineer III (Fixed Night Shift Role)As a Site Reliability Engineer, you will play a key role in ensuring our systems remain reliable, available, and performant for both our customers and internal teams. Your expertise will directly impact our users' experience and the success of our business.In this role, you'll...


  • Gurugram, Haryana, India Realign LLC Full time

    **Job Type: Full Time**: **Job Category: IT**: Job Title: Site Reliability Engineer Job Summary: Responsibilities and Duties: - Implement and maintain automated monitoring and alerting systems to proactively identify and mitigate issues - Collaborate with development teams to design and implement scalable and reliable services - Troubleshoot and resolve...


  • Gurugram, India S&P Global Market Intelligence Full time

    About the Role:  OSTTRA India The RoleSite Reliability Engineer The TeamSRE is a global team that provides technical support across the suite of OSTTRA products. The SRE team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our...


  • Bengaluru, Gurugram, India Rackspace Technology Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Site Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next generation of modern...


  • Bengaluru, Chennai, Gurugram, India Natwest Digitalx Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Join us as a Site Reliability EngineerIn this key role, youll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYoull enjoy significant stakeholder interaction, working in...


  • Gurugram, India GSPANN Full time

    Hiring for SRE - Exp- 6+ Years Notice Period - Immediate - 15 days About the Role We are seeking a skilled and passionate Observability Engineer (SRE) to join our team and drive reliability, performance, and visibility across our infrastructure and applications. You will play a key role in designing and implementing observability solutions, improving system...