Senior System reliability engineer

7 days ago


India ThoughtSpot Full time

Job Description Site Reliability Engineer (Technical Support) About Us: ThoughtSpot is an AI-powered analytics platform that enables users to explore and analyze data through natural language queries, making insights accessible to all. Our mission is to deliver reliable, high-performing applications that empower our customers. We are seeking a Site Reliability Engineer who excels at providing technical support for our end users , incident management and resolution, and cloud operations within a customer-centric environment. Role Overview: We are seeking a Site Reliability Engineer (SRE) with a strong focus on customer-facing technical support. In this role, you will be the primary point of contact for our enterprise SaaS customers, addressing and resolving technical issues to ensure optimal system performance and user satisfaction. Your responsibilities will encompass managing incoming support tickets, providing timely solutions, and maintaining high system uptime and application availability. This position requires a deep understanding of systems engineering principles, extensive Linux system administration expertise, and the ability to monitor and manage large-scale cloud clusters. Your technical acumen, combined with excellent communication skills, will be crucial in delivering a superior support experience and contributing to the reliability and efficiency of our SaaS platform. Key Responsibilities: Technical & Product Support: - Serve as the first line of support for customer-reported technical issues related to our SaaS platforms. - This involves data connectivity issues, report errors, performance concerns, access problems, data inconsistencies, software bugs, integration challenges etc - Understand and empathize with the challenges ThoughtSpot users face, offering tailored solutions to improve their user experience. - Ensure prompt and accurate updates, meet SLAs and provide timely resolution to customer issues via tickets and calls. - Create knowledge-base articles to document knowledge and help customers self service. System Reliability & Monitoring: - Maintain, monitor, and troubleshoot ThoughtSpot cloud infrastructure. - Monitor system health and performance through metrics, logs, and dashboards using tools like Prometheus, Grafana, to detect and prevent issues early. - Work with Engineering teams to define, and implement tools to enhance debuggability, supportability, availability, scalability, and performance. - Be an expert in cloud and on-premise infrastructure by developing automation and best practices. - Participate in on-call rotation for critical SRE systems, lead the incident review and root cause analysis. Required Skills & Experience: - B.S. degree in Computer Science or relevant industry experience. - Exceptional communication skills, both written and verbal, to effectively engage with cross-functional teams, customers, and stakeholders. - Relevant work experience troubleshooting complex Linux Systems and managing distributed systems. - Experience in virtualization and Cloud technologies. - Experience in enterprise customer support, on-call rotation for critical SRE systems, leading incident review and root cause analysis. - Ability to diagnose technical problems and work with Engineering on escalated issues. - Strong problem solving skills, algorithmic thinking and a strong foundation in how systems should work. - Understanding of tools & frameworks required to Operate and manage Cloud infrastructure. - Strong customer service skills. - Solid communication skills and ability to work independently. - Ability to leverage automation, monitoring and data analysis to ensure high availability. - Familiarity with scripting languages such as Python, JavaScript orBashExposure to infrastructure and service monitoring tools. Ideal Candidate Profile: - You thrive in dynamic, customer-facing environments and are passionate about ensuring system reliability and customer satisfaction. - You have a balanced mix of technical expertise in cloud operations and a proven record in handling support incidents and end-user queries, setting you apart from candidates with purely systems or cloud engineering background. What We Offer: - Competitive salary and benefits package. - Opportunities for professional growth and career advancement. - A collaborative work environment where your input and expertise directly impact our customer experience.If you're ready to leverage your technical skills in a role that directly influences customer success and BI user satisfaction, we'd love to hear from you. It's a remote role where we will expect you to work in EMEA/EST hours - Base location - Bangalore (HSR)


  • Senior Test Engineer

    2 weeks ago


    NA, Mumbai, Maharashtra, India , India Reliability Engineering Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Performance Analysis:Identify performance-related requirements and objectives for software projects.Analyze and model system behavior under different conditions to predict performance issues.Performance Testing:Develop and execute comprehensive performance test plans and strategies.Perform various types of performance testing, including load testing, stress...

  • Senior Test Engineer

    2 weeks ago


    Mumbai, Maharashtra, NA, India, India Reliability Engineering Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Configure and maintain Datadog dashboards, alerts, monitors, SLOs & SLIs. Integrate Datadog with cloud environments (AWS / Azure / GCP), Kubernetes, and on-prem applications. Implement APM traces, RUM, Infrastructure Monitoring, and Log Management. Develop and standardize observability best practices across teams. Troubleshoot performance issues using...

  • Senior Sales Engineer

    3 weeks ago


    India Emergence System Full time

    Job Description Company Description Emergence System is a leading automation solutions provider and system integrator based in Aurangabad, Maharashtra. We specialize in delivering cutting-edge automation solutions across various industries. As authorized channel partners of renowned brands like Mitsubishi Electric, Cognex, Lapp India, and Masibus...


  • India Futran Solutions Full time

    We Are Hiring: Sr. System Reliability Engineer (Application Support + Automation) Location: Pune ⏳ Notice Period: Immediate to 15 Days Are you passionate about building resilient, scalable, and high-performance production environments? This is your opportunity to join a global tech team and make a real impact. ⭐ Designation: Sr. System Reliability...


  • India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    DescriptionSenior Site Reliability Engineer - RemoteDo you have a passion for cutting edge technologies and tackling system problems?Are you a self-starting professional who thrives in a fast-paced environment?Join our critical CPS SRE teamWe ensure that infrastructure services have world-class reliability and uptime. Site Reliability Engineer(SRE)s are the...


  • india Akamai Full time

    Description Do you like collaborating across teams to solve complex problems? Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating...


  • India Akamai Full time ₹ 10,50,000 - ₹ 22,50,000 per year

    Description Do you like collaborating across teams to solve complex problems? Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating...


  • India Jobgether Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in India.We are seeking an experienced Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of critical security infrastructure. In this role, you will lead initiatives for operational...


  • India Akamai Technologies Full time

    Job Description Job Description Senior Site Reliability Engineer - Remote Do you have a passion for cutting edge technologies and tackling system problems Are you a self-starting professional who thrives in a fast-paced environment Join our critical CPS SRE team! We ensure that infrastructure services have world-class reliability and uptime. Site Reliability...


  • Hyderabad, India Microsoft Full time

    Job Description Join the Azure Specialized AI Infrastructure team in India to drive advancements in Artificial Intelligence (AI) and support high-performance infrastructure for generative AI workloads. As a Senior SRE, you will automate and maintain large-scale distributed systems powering latest AI applications and machine learning models. Your primary...