Senior Site Reliability Engineer- ELK Expert

4 weeks ago


India iVedha Inc. Full time
Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice

Location: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.

Role Summary:

Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?

We're looking for an SRE with 7+ years of experience, including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana), to join our Platform Engineering Practice. In this role, you'll design, manage, and scale ELK clusters ingesting 2–3+ TB/day, enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.

Why Join Us
  • Career Growth: Work alongside industry experts on cutting-edge cloud technologies
  • Competitive Compensation and Benefits: We recognize and reward top talent
  • Exciting, Impactful Work: Design and build scalable, resilient cloud environments
  • Strategic Platform Role: Contribute to the foundation of next-gen observability and reliability infrastructure
What You Will Do
  • Design and Optimize Cloud Infrastructure: Architect scalable, fault-tolerant systems on Microsoft Azure
  • Automate Everything: Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration
  • Ensure Reliability and Performance: Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor
  • Enhance Security and Compliance: Implement security best practices across DevOps workflows
  • Collaborate and Innovate: Work closely with engineering, security, and operations teams to drive automation and efficiency
  • Manage and scale large ELK clusters handling 2–3+ TB/day log volumes, ensuring high availability and performance
  • Optimize ELK architecture: Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage
  • Build and tune log pipelines: Scale Logstash and Beats pipelines across distributed environments
  • Support Kibana observability layers: Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)
What You Bring
  • 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering
  • 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)
  • Strong experience managing large-scale ELK clusters in production with heavy ingestion (multi-TB/day)
  • Deep knowledge of index tuning, shard allocation, ILM policies, and scaling ELK components
  • Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)
  • Proficiency in Python, Go, or Bash for automation and scripting
  • Deep understanding of Kubernetes, Docker, and cloud-native architectures
  • Experience with observability tools such as Prometheus, Grafana, Azure Monitor
  • Ability to work in a fast-paced, collaborative environment and solve complex operational issues
Education
  • Bachelor's or Master's degree in Computer Science, Information Technology, or a related field
Certifications (Nice to Have)
  • Microsoft Azure certifications: AZ-104, AZ-400


  • India CES Full time

    We're looking for a highly skilled Site Reliability Engineer to help us build, manage, and scale modern infrastructure systems for high-availability applications. If you're passionate about automation, cloud platforms, and solving tough operational challenges, we would love to hear from you.Key Skills and Competencies3+ years of extensive experience with...


  • India Cimpress Full time

    Senior Site Reliability EngineerWho We Are:Cimpress Technology develops cutting-edge, best-in-world software that our mass customization businesses use to create personalized products for over 17 million global customers. Our Mass Customization Platform consists of modular, multi-tenant services. Our businesses can choose the solutions that work for them, or...


  • India BQE Software Full time

    We are seeking a Senior Site Reliability Engineer to lead reliability efforts across our application stack, focusing on high availability, performance, and scalability.This role will own the health and uptime of our mission-critical application , Cloud infrastructure , database system , and monitoring infrastructure . About Us At BQE, our mission...


  • India CloudHire Full time

    Job SummaryThe Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and...


  • India Xebia Full time

    We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency...


  • Remote, India Rackspace Technology Full time

    Job DescriptionSite Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery - Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next...


  • India Tanla Platforms Limited Full time

    About the Role: As a Site Reliability Engineer , you will be responsible for ensuring platform and application availability, scalability, and reliability, while maintaining optimal system uptime. What you''ll be Responsible for? Build, monitor and maintain highly scalable, large-scale deployments. Installation/deployment of new releases,...


  • India CES Full time

    Job DescriptionWe are seeking a hands-on SRE with expertise in infrastructure automation, cloud scalability, and performance optimization. Youll design, manage, and monitor large-scale AWS environments, ensuring high availability, security, and reliability for our SaaS platformsKey Responsibilities- Develop and execute UI automation using Cypress with...


  • India beBeeCloud Full time £ 1,00,870 - £ 1,41,290

    **Reliable Engineering Expert Wanted**Key Responsibilities:Collaborate with operations and development teams to automate repetitive tasks.Ensure end-to-end quality for application migrations to cloud, from high-quality documentation to observability flows, to operational acceptance testing and beyond.Craft reusable dashboards and alerting pipelines from raw...


  • India CES Full time

    We are seeking a hands-on SRE with expertise in infrastructure automation, cloud scalability, and performance optimization. You'll design, manage, and monitor large-scale AWS environments, ensuring high availability, security, and reliability for our SaaS platformsKey ResponsibilitiesDevelop and execute UI automation using Cypress with TypeScript.Conduct...