Site Reliability Engineer

2 days ago


India CitNOW Group Full time

About usFounded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably. CitNOW’s app-based platform provides a secure, brand-compliant solution – for dealers to build trust, transparency and long-lasting relationships.CitNOW Group was formed in 2021 to unite a portfolio of 12 global software companies leveraging innovation to aid retailers and manufacturers in delivering an outstanding customer experience. We have over 300 employees worldwide who all contribute to our vision to provide market-leading automotive solutions to drive efficiencies, seamlessly transforming every customer moment.The CitNOW Group is no ordinary technology company, we live a series of One Team values and this guiding principle forms the foundation of CitNOW Group’s award winning, collaborative and inclusive culture. Recognised recently within the Top 25 Best Mid Sized Companies to work for within the UK, we pride ourselves on being a great place to work.About the role We are looking for a proactive and experienced Site Reliability Engineer (SRE) to join our Engineering team remotely in India. The ideal candidate will have deep expertise in cloud operations, automation, monitoring, and reliability engineering, with hands-on experience managing a wide range of SaaS and infrastructure tools. The role focuses on ensuring system uptime, performance, and scalability across our global platform.Key responsibilities:Reliability & Infrastructure ManagementDesign, implement, and manage scalable cloud infrastructure on Google Cloud (GCP) and AWSManage integrations and operations across third-party platforms including Mongo Atlas, Cloudflare, Stripe, Cledara, Datadog, Atlassian Status page, Semaphore, Postmark, SendGrid, Lokalise, Zendesk (Smooch & Smooch EU), Twilio, Mailgun, Facebook, Google Workspace, Asana, GitHub, Ngrok, npm, Readme, Loom, Deepgram, and OpenAIImplement Infrastructure as Code (IaC) using tools like Terraform or Ansible to automate provisioning and scalingEnsure systems adhere to security, compliance, and reliability best practicesMonitoring, Alerting & Incident ManagementBuild and maintain observability solutions using Datadog, GCP Logging, and related tools for monitoring system health, latency, and performanceDefine and manage SLOs, SLIs, and SLAs to measure and maintain reliabilityImplement proactive alerting, diagnostics, and runbooks for efficient incident responseParticipate in on-call rotations and lead root cause analyses (RCA) for post-incident reviewsAutomation & CI/CDDesign and optimize CI/CD pipelines using Semaphore CI/CD, GitHub Actions, or similar toolsDevelop automation scripts and utilities in Python, Bash, or equivalent scripting languages to streamline operations and reduce manual interventionsIntegrate and automate workflows between systems such as Asana, Github, and Google Workspace for operational efficiencySecurity & GovernanceManage identity and access controls across cloud services and third-party SaaS platformsImplement best practices for secrets management, data protection, and compliance with privacy standardsCollaboration & Continuous ImprovementPartner closely with developers to design resilient, high-performing services.Promote an SRE culture focused on continuous learning, blameless postmortems, and process improvement.Maintain up-to-date operational documentation, playbooks, and architectural diagrams.We are looking for: Bachelor’s degree in computer science, Engineering, or related field4+ years of experience in Site Reliability Engineering, DevOps, or Cloud OperationsStrong experience with Google Cloud Platform (GCP), Amazone Web Services (AWS) and Mongo AtlasProven ability to manage and integrate multiple SaaS and developer tools (Datadog, Cloudflare, Atlassian Status page, Semaphore, SendGrid, etc.)Hands-on experience with CI/CD pipelines, Terraform, GitHub Actions, and containerized environments (Docker, GCP Cloud Run, or Kubernetes)Expertise in monitoring, incident response, and system optimizationExcellent troubleshooting, documentation, and communication skillsStrong collaboration mindset aligned with cross-functional development and operations teamsIn addition to a competitive salary, our benefits package is second to none. Employee wellbeing is at the heart of our people strategy, with a number of innovative wellness initiatives such as flexi-time, where employees can vary their start and finish times within our core business hours and/or extend their lunch break by up to 2 hours per day. Employees also benefit from an additional two half days paid leave per year to focus on their personal wellbeing.We recognise the development of our people is vital to the ongoing success of the business and proudly promote a culture of continuous learning and improvement, along with opportunities to develop and progress a successful career with us.The CitNOW Group is an equal opportunities employer that celebrates diversity across our international teams. We are passionate about creating an inclusive workplace where everyone’s individuality is valued.View our candidate privacy policy here - CitNOW-Group-Candidate-Privacy-Policy.pdf (citnowgroup.com)



  • Bengaluru, India Relanto Full time

    Job Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...


  • , India, IN Sonata Software Full time

    We're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...


  • India Akamai Technologies Full time

    Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...


  • India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Do you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...


  • India LivePerson Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    LivePerson (NASDAQ: LPSN) is a leading customer engagement company, creating digital experiences powered by Curiously Human AI. Every person is unique, and our technology makes it possible for companies, including leading brands like HSBC, Orange, and GM Financial, to treat their audiences that way at scale. Nearly a billion conversational interactions are...


  • India LivePerson Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    LivePerson (NASDAQ: LPSN) is a leading customer engagement company, creating digital experiences powered by Curiously Human AI. Every person is unique, and our technology makes it possible for companies, including leading brands like HSBC, Orange, and GM Financial, to treat their audiences that way at scale. Nearly a billion conversational interactions are...


  • India CareerUS Solutions Full time

    Job Description Position Overview: The Site Reliability Engineer (SRE) is responsible for ensuring the stability, scalability, performance, and reliability of production systems and services. This role bridges software development and operations, using automation, monitoring, and performance optimization to build resilient systems that can scale efficiently...


  • Bengaluru, Karnataka, India, Karnataka WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • India CitNOW Group Full time

    About us Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably. CitNOW’s app-based platform provides a secure, brand-compliant solution – for dealers to build trust, transparency and long-lasting relationships. CitNOW Group was formed...


  • India CitNOW Group Full time

    About us Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably. CitNOW’s app-based platform provides a secure, brand-compliant solution – for dealers to build trust, transparency and long-lasting relationships. CitNOW Group was formed...