Site Reliability Engineer
4 weeks ago
Position: SRE Observability EngineerExp: 5+ YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus and Loki. Job Description:We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. As a Senior Observability Engineer, you will be at the forefront of implementing cutting-edge monitoring, logging, and tracing solutions that ensure the reliability, performance, and availability of our complex, distributed systems. You will be collaborating with cross-functional teams, including Development, Infrastructure Engineers, DevOps, and SREs, to optimize system observability, and improve our incident response capabilities. Key Responsibilities:Lead the Design & Implementation of observability solutions, including monitoring, logging, and tracing for both cloud and on-premises environments.Drive the Development and maintenance of advanced monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics.Implement Distributed Tracing frameworks like OpenTelemetry, Jaeger, or Zipkin, and enhance application performance diagnostics and troubleshooting.Optimize Log Management and analysis strategies using tools like Elasticsearch, Splunk, Loki, and Fluentd, ensuring efficient log processing and insights.Develop Advanced Alerting and anomaly detection strategies to proactively identify system issues, minimizing downtime and improving Mean Time to Recovery (MTTR).Collaborate with Development & SRE Teams to enhance observability in CI/CD pipelines, microservices architectures, and across various platform environments.Automate Observability Tasks by leveraging scripting languages such as Python, Bash, or Golang to increase efficiency and scale observability operations.Ensure Scalability & Efficiency of monitoring solutions to manage large-scale distributed systems and handle evolving business requirements.Lead Incident Response by providing actionable insights through observability data for effective troubleshooting and root cause analysis.Stay Abreast of Industry Trends in observability, Site Reliability Engineering (SRE), and monitoring practices, continuously improving processes.Required Qualifications:5+ years of hands-on experience in observability, SRE, DevOps, or a related field, with a proven track record of successfully managing complex, large-scale distributed systems.Expert-level proficiency in observability tools such as Prometheus, Grafana, Datadog, New Relic, AppDynamics, with the ability to lead the design and implementation of these solutions at scale.Advanced experience with log management platforms like Elasticsearch, Splunk, Loki, and Fluentd, and the ability to optimize log aggregation and analysis for better performance insights.Deep expertise in distributed tracing tools such as OpenTelemetry, Jaeger, or Zipkin, with a focus on performance optimization and root cause analysis.Extensive experience with cloud environments (preferably Azure, AWS, GCP) and Kubernetes for deploying and managing observability solutions across modern, cloud-native infrastructures.Advanced proficiency in scripting languages such as Python, Bash, or Golang, and strong experience with Infrastructure as Code (IaC) tools like Terraform and Ansible.Strong understanding of system architecture, performance tuning, and troubleshooting complex production environments, with an emphasis on scalability and high availability.Proven experience in leading and mentoring teams, providing technical direction, and driving the adoption of best practices for observability and monitoring.Exceptional problem-solving skills, with a focus on providing actionable insights and data-driven decision-making.Ability to lead high-impact projects, effectively communicate with stakeholders, and influence cross-functional teams.Strong communication and collaboration skills; demonstrated ability to work closely with engineering teams, leadership, and external partners to meet observability and system reliability goals.Preferred Qualifications:Experience with AI-driven observability tools and anomaly detection techniques.Familiarity with microservices, serverless architectures, and event-driven systems.Proven track record of handling on-call rotations and incident management workflows in high-availability environments.Relevant certifications in observability tools, cloud platforms, or SRE best practices are a plus.
-
thane, India beBeeReliability Full timeSite Reliability EngineerWe are seeking an experienced Senior Site Reliability Engineer to design, scale and optimize the reliability and performance of our production systems.Build, design and manage a Site Reliability program from the ground up.Owning all aspects of incident response including on-call rotation, system alerting, escalation, remediations and...
-
Site Reliability Engineer Iii
5 days ago
Thane, Maharashtra, India Forcepoint Full timeWho is Forcepoint Forcepoint simplifies security for global businesses and governments Forcepoint s all-in-one truly cloud-native platform makes it easy to adopt Zero Trust and prevent the theft or loss of sensitive data and intellectual property no matter where people are working 20 years in business 2 7k employees 150 countries 11k customers 300 patents If...
-
Site Reliability Engineer
1 week ago
Wagle Estate, Thane, Maharashtra, India Sahasrara Metatech Pvt. Ltd. Full time ₹ 5,40,000 - ₹ 6,00,000 per yearWe are seeking an experienced Site Reliability Engineer (SRE), immediate joiner with strong skills in Linux administration, virtualization, Windows domain environments, firewall management, monitoring systems, and cloud technologies. The ideal candidate must have hands-on experience with GCP and Terraform, with additional exposure to Kubernetes or ELK/Signoz...
-
Reliability Expert
3 days ago
thane, India beBeeSRE Full timeWe're seeking a skilled Site Reliability Engineer to join our team. As an SRE, you'll be responsible for deploying and managing Kubernetes clusters, implementing infrastructure as code (IaC) solutions, automating operational workflows, and applying site reliability engineering principles.Key skills include experience with Rancher Kubernetes, Linux internals,...
-
Site Supervisor
5 days ago
Thane, India Disha Skill Training Services Full time**Job Title - Site Supervisor** **Job Location: - Badlapur** **Company Profile -We are a 29 years old Recruitment & Training Consultancy Firm.** **This Vacancy is for our Client, a Well known Infra Structure Company based in Kalyan** **We are looking for Competent Site Supervisor to handle our Projects in Badlapur.** **Job Description: -** - **To...
-
Site Engineer
2 weeks ago
Thane, MH, IN Placement Local Services Full timePosition Junior Erection Site Engineer Salary- upto 25 K Experience- Min 1 year Location Bhiwandi Working Days Mon- Sat 10AM-7PM Interested Candidates can share your CV on following number HR Shweta-99875 39077 Job Summary We are looking for a dynamic and detail-oriented Erection Site Engineer to support our project execution on-site The ideal candidate will...
-
Site Supervisor
6 days ago
Thane, India Kartikya Builders Full timeRequired site supervisor full time for construction site, Should be able to give proper report of work on site and co-ordinate construction activity properly on site. **Job Types**: Full-time, Regular / Permanent **Salary**: ₹11,000.00 - ₹12,000.00 per month Supplemental pay types: - Commission pay Ability to commute/relocate: - Karjat, Thane -...
-
Hiring For Site Engineer
2 weeks ago
Thane, India M. S. Developers Full timeman required neat and clean with age between 25 to 45 years. nearby kalyan & dombivali.language speaking required marathi, hindi and english. salary - as per experience.Experience 5 - 10 Years No. of Openings 1 Education Graduate Role Site Engineer Industry Type Real Estate / Property / Construction Gender Male Job Country India Type of Job Full Time Work...
-
Site Engineer
1 week ago
Thane, India Soham Secure Full timeResponsible for supervising Hilti Firestop works as per UL certifications and Hilti guidelines. Ensure quality, safety, and timely execution at site. Coordinate with clients, monitor progress, and manage manpower and materials. Coordination, quantification, making of permits, passing bills etc. Requirements : Diploma/Degree in Civil or related field. 1–3...
-
Site Engineer
7 days ago
Thane, India Soham Secure Full timeResponsible for supervising Hilti Firestop works as per UL certifications and Hilti guidelines. Ensure quality, safety, and timely execution at site. Coordinate with clients, monitor progress, and manage manpower and materials. Coordination, quantification, making of permits, passing bills etc. Requirements : Diploma/Degree in Civil or related field. 1–3...