Datacenter Observability and Site Reliability Engineer
6 days ago
Datacenter Observability and Site Reliability Engineer
Location:Remote, India
contract Duration: 6 months+
working hours: 5.30 am to 2.30 pm IST
Roles and Responsibilities:
Observability and Monitoring:
- Design, implement, and maintain observability solutions for datacenter infrastructure.
- Develop, deploy, and maintain the operational and reliability components of a large-scale Observability and Telemetry collection platform, emphasizing performance at scale, real-time monitoring, logging, and alerting.
- Participate in and enhance the entire lifecycle of services, from inception and design to deployment, operation, and refinement.
- Develop and optimize monitoring systems to ensure high availability and performance.
- Create and manage dashboards, alerts, and reports to provide visibility into system health and performance.
Site Reliability Engineering (SRE):
- Implement SRE best practices to improve the reliability, scalability, and performance of datacenter services.
- Develop and maintain automation scripts for infrastructure provisioning, monitoring, and management.
- Conduct root cause analysis and post-mortem reviews to prevent recurrence of incidents.
Performance Optimization:
- Analyze and optimize the performance of datacenter systems and applications.
- Implement best practices for resource utilization and efficiency.
Collaboration:
- Work closely with other engineering teams to understand and meet their observability and reliability requirements.
- Collaborate with hardware and software vendors to evaluate and integrate new technologies.
Security and Compliance:
- Ensure that observability and reliability solutions comply with security policies and industry standards.
- Implement and maintain security measures to protect data and infrastructure.
Troubleshooting and Support:
- Provide support for observability and reliability-related issues, including debugging and resolving hardware and software problems.
- Develop and maintain documentation for troubleshooting procedures and best practices.
Continuous Improvement:
- Stay updated with the latest advancements in observability and SRE technologies and integrate them into the infrastructure.
- Continuously improve the reliability, scalability, and performance of datacenter services.
Qualifications:
Education:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Experience:
- 8+ years of experience in datacenter observability and site reliability engineering.
- Proven experience in managing and optimizing large-scale datacenter environments.
Technical Skills:
- Proficiency in observability tools and technologies (e.g., Prometheus, Grafana, ELK Stack).
- Experience with SRE practices and tools (e.g., Kubernetes, Docker, Terraform).
- Strong programming and scripting skills (e.g., Python, Go, Bash).
- Familiarity with cloud platforms (AWS, Azure, GCP) and their observability and reliability services.
Soft Skills:
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
- Ability to work in a fast-paced, dynamic environment.
-
India Tekgence Inc Full timeJob Title: Datacenter Observability and Site Reliability EngineerLocation: Remote, IndiaDuration: 6 months+ likely to be extendedTimings: 5:30 AM to 2:30 PM IST**Key Requirements**5+ Observability Engineering with deep understanding of the Grafana software stack and who has experienced in building and maintaining large, scaled enterprise observability...
-
India Tekgence Inc Full timeJob DescriptionWe are seeking a highly skilled Datacenter Observability and Infrastructure Reliability Specialist to join our team at Tekgence Inc.About the RoleIn this critical role, you will be responsible for designing, implementing, and maintaining observability solutions for datacenter infrastructure. You will also develop, deploy, and maintain...
-
Senior Site Reliability Engineer
5 days ago
India HARP Technologies and Services Full timeAbout UsWe are HARP Technologies and Services, a company dedicated to delivering high-quality solutions in Site Reliability Engineering (SRE).Job DescriptionWe are seeking a highly skilled Senior Site Reliability Engineer with extensive experience in cloud infrastructure and monitoring tools.The ideal candidate will have 12 years of experience, including 7...
-
Site Reliability Engineering Lead
5 days ago
India HARP Technologies and Services Full timeOverview:At HARP Technologies and Services, we are committed to delivering high-quality software solutions that meet the needs of our customers. To achieve this goal, we need a talented and experienced Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for ensuring the reliability,...
-
Site Reliability Engineer
1 day ago
India FourthPointer Services Pvt. Ltd. Full timeJob Title : Site Reliability Engineer (SRE) Experience Required : 5 years Location : Noida (Remote) Job Description : We are looking for an experienced Infrastructure Site Reliability Engineer (SRE) to join our team. This role involves managing and optimizing infrastructure with a primary focus on Kafka, OpenSearch, and multi-cloud environments. Key...
-
Site Reliability Engineer
5 days ago
India HARP Technologies and Services Full timeExperience : 8 Years Location : Mumbai,Chennai (Other cities Remote) Notice period : Immediate to 30 days max Responsibilities of Senior SRE : - The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services. - They work with cross-functional teams to design, build and maintain...
-
Site Reliability Engineer
1 day ago
India Ascendion Full timeJob Description : We are looking for an experienced Azure Site Reliability Engineer (SRE) with 6-9 years of experience to support and administer Azure Kubernetes Service (AKS) clusters running critical middleware handling thousands of transactions per second (TPS). The ideal candidate will have a strong background in Infrastructure as Code (IaC), cloud...
-
Senior Network Engineer
5 days ago
India The Sourcing Team Pvt Ltd Full timeAbout the JobThe Sourcing Team Pvt Ltd is seeking a Senior Network Engineer - Datacenter to join our team. As a senior network engineer, you will be responsible for designing, implementing, and managing datacenter network infrastructure. This includes monitoring and troubleshooting network issues, configuring network devices, and ensuring high availability...
-
Site Reliability Engineer
3 weeks ago
India Pro5 Full timeJob Opening: Site Reliability Engineer (SRE) - US Hours Coverage Location: Remote Shift: 8 AM - 8 PM EST (US Timezone Coverage) Overview: ReturnKey is an early-stage startup backed by leading VCs, revolutionizing zero-waste retail in the US. We're tackling the country's massive retail returns and overstock problem with our cutting-edge AI-powered Recommerce...
-
Site Reliability Engineer
5 days ago
India Agivant Technologies Full timeJob Description : We are looking for a highly skilled Site Reliability Engineer (SRE) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers. As an SRE, you will be responsible for architecting and optimizing our global infrastructure,...
-
Site Reliability Engineer
6 days ago
India Awign Full timeAbout Awign Expert : Awign Expert is an enterprise-focused platform that helps businesses Hire, Assess and Manage highly skilled resources for Gig Based Projects. We provide our Experts a gateway to work for and build a freelance/consulting career with large-scale Enterprises. We are a newly launched business division of Awign, which is one of the pioneers...
-
Site Reliability Engineer
1 week ago
India Buncha Full timeAbout the Role:We are seeking a passionate and detail-oriented Site Reliability Engineer to join our dynamic team. The ideal candidate will have 3+ years of experience in system monitoring, reliability, and troubleshooting applications. You will play a crucial role in ensuring the availability, performance, and scalability of our systems.Key...
-
Site Reliability Engineer
6 days ago
India IVedha Inc. Full timeSite Reliability Engineer (SRE) Level 3 with CRE and Automation Expertise Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) Level 3 with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer Reliability Engineering (CRE) and Automation.The...
-
Site Reliability Engineer
4 weeks ago
India iVedha Inc. Full timeSite Reliability Engineer (SRE)//**Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model**//Position Overview:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer Reliability...
-
Site Reliability Engineer
2 days ago
India iVedha Inc. Full timeSite Reliability Engineer (SRE) //**Remote in India and have to work in EST (US/Canada) Time Zone with 24*7 Support Model **// Position Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong expertise in Python, advanced proficiency in Azure-based infrastructure, and significant experience in Customer...
-
Cloud Site Reliability Engineer
6 days ago
India Agivant Technologies Full timeJob Description : We are looking for a highly skilled Site Reliability Engineer (SRE) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers. As an SRE, you will be responsible for architecting and optimizing our global infrastructure,...
-
Site Reliability Engineer
1 week ago
India noon Full timeJob Title: Site Reliability EngineerLocation: Dubai, United Arab EmiratesAbout noon noon.com is a technology leader with a simple mission: to be the best place to buy and sell things. In doing this we hope to accelerate the digital economy of the Middle East, empowering regional talent and businesses to meet the full range of consumers' online needs. noon...
-
Site Reliability Engineer
4 days ago
India Newfold Digital Full timeJob DescriptionOverviewWe are looking for a Site Reliability Engineer Linux, who approaches their work with passion, a hunger for learning and growth, and a steadfast commitment to delivering outstanding results. If you're a team player with a positive mindset, keen to make a meaningful impact, we encourage you to reach out to usNewfold Digital is a leading...
-
Datacenter Network Management Expert
5 days ago
India The Sourcing Team Pvt Ltd Full timeJob SummaryThe Sourcing Team Pvt Ltd is seeking a highly skilled Datacenter Network Management Expert to join our team. As a datacenter network management expert, you will be responsible for managing and maintaining datacenter network infrastructure, including monitoring and troubleshooting network issues, configuring network devices, and ensuring high...
-
Reliability Engineering Lead
1 day ago
India HARP Technologies and Services Full timeAbout the Role :We are seeking an experienced Staff Engineer to join our Site Reliability Engineering (SRE) team at HARP Technologies and Services. As a Staff Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our cloud-based systems.Duties and Responsibilities :Design, implement, and maintain highly scalable and...