High Availability Platform Specialist

4 days ago

Hyderabad, Telangana, India beBeeSiteReliabilityEngineer Full time ₹ 9,00,000 - ₹ 10,80,000

About the Role: We are seeking a skilled Site Reliability Engineer to join our team. In this critical role, you will be responsible for ensuring the high availability, scalability, and reliability of our platforms and applications.

Key Responsibilities:

Design, implement, and maintain highly scalable, large-scale deployments to ensure optimal system uptime.
Install and deploy new releases, environments for applications, and monitor their performance.
Proactively identify potential issues and develop monitoring tools and dashboards to ensure high availability of production environments.
Incident Management: Lead incident response efforts, diagnose root causes, and implement long-term solutions to prevent recurrence.
Collaboration & Coordination: Work closely with cross-functional teams to ensure efficient platform integration, API management, and campaign execution.
Troubleshooting and Root Cause Analysis: Utilize your expertise to investigate and resolve incidents quickly during crisis situations.
Monitor performance metrics and implement corrective actions when necessary.
Platform Integration: Manage and oversee the integration of various APIs, ensuring seamless interoperability between systems and third-party services.
Support the compliance and security integrity of the environments.
Adhere to process compliance & ensure platform reliability.
Experience in monitoring and automations in Prometheus Grafana or ELK or Datadog or Dynatrace or any observability tools.
Experience with container management and micro-services architectures such as Docker in cloud or on-premises infrastructure.

Your Skills and Qualifications:

Kubernetes: Expertise in creation, maintenance, scaling, and upgrades of Production clusters.
Docker: Must have experience in writing Docker files complying with Industry standard best practices.
CI/CD: Must have hands-on experience with Azure-DevOps/Jenkins in creation & Execution of Pipelines in a multi-target environment.
Troubleshooting skills: Expertise in analysis of applications logs to drilldown in identification of the issue with expertise on logging stacks such as ELK, Dynatrace, Splunk
Monitoring Stacks: Expertise in using Grafana with skills on building & managing of dashboards on various data sources in Grafana.
Programming Skills: Experience in creating & managing of Bash scripts & Ansible with some exposure on Terraform.
Environment: Excellent skills and hands-on in Linux environments and able to troubleshoot issues at OS levels.
Experience on usage of project management tools such as JIRA
Experience in deploying & Managing of Distributed Queuing systems such as Redis, Kafka Rabbit-MQ, IBM-MQ, MSMQ
Experience in deploying & managing of Databases in standalone & cluster modes with basic DB Skills on Postgres, MySQL, Click House
Prior experience in working on high traffic & highly scalable platforms is an added advantage.
Good command on Linux, Networking concepts (TLS/SSL, DNS, Load Balancers, etc.,) and troubleshooting skills in large scale environments
Deep understanding of basic security concepts and protocols - authentication, authorization, signing, encryption, SSL/TLS, SSH/SFTP, X509 certificates
Good knowledge of ITIL terminology for incident and problem management
Track record of excellent interpersonal, analytical, and communication skills.
Bachelor of Science in Computer Science or other related discipline.

Why Join Us?

Meaningful Work: As a Site Reliability Engineer, you will play a pivotal role in safeguarding our assets, data, and reputation in the industry.
Tremendous Growth Opportunities: Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development.
Innovative Environment: Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated.

We champion diversity and are committed to creating an inclusive environment for all employees.

Learn More:

High Availability Specialist

2 weeks ago

Hyderabad, Telangana, India beBeeHighAvailability Full time ₹ 8,00,000 - ₹ 12,00,000

Job Title: High Availability SpecialistAs a key member of our technical team, the High Availability Specialist will be responsible for ensuring the reliability and uptime of our systems.Key Responsibilities:Design and implement high availability solutions to minimize downtime and maximize system performance.Collaborate with cross-functional teams to identify...
High Availability Infrastructure Specialist

1 week ago

Hyderabad, Telangana, India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 25,00,000

Job Role: High Availability Infrastructure SpecialistJob Overview:We are seeking experienced professionals to fill multiple openings for data centre engineers, SQL/Mongo DBAs, and cloud engineers. As a key member of our team, you will play a crucial role in ensuring the smooth operation of our data centres.Responsibilities:Design, implement, and maintain...
High Availability Redis Developer

2 weeks ago

Hyderabad, Telangana, India beBeeInfrastructure Full time ₹ 9,00,000 - ₹ 12,00,000

Redis Infrastructure Engineer">We are seeking a skilled and proactive Redis Engineer to join our Data Movement team. The ideal candidate will have deep experience in installing, configuring, maintaining, and providing production support for Redis instances (both standalone and clustered setups). The successful candidate will be responsible for automating...
Redis Expert

2 weeks ago

Hyderabad, Telangana, India beBeeKubernetes Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

About the RoleThis is an exceptional opportunity to work as a skilled Kubernetes and Redis expert. As part of our team, you will play a crucial role in installing, configuring, maintaining, and providing production support for Redis instances.Key Responsibilities:Install, configure, and maintain Redis (standalone, clustered, and sentinel configurations) in...
Cloud Engineer for High-Availability Systems

1 week ago

Hyderabad, Telangana, India beBeeCloud Full time ₹ 1,50,00,000 - ₹ 2,25,00,000

Cloud Operations L2 Support Engineer Job OverviewThis highly skilled Cloud Engineer position is a critical role in ensuring the availability, reliability, and performance of our platform services and applications. The ideal candidate will possess deep expertise in Kubernetes, cloud operations, and a passion for optimizing complex distributed systems.Key...
Highly Available System Administrator

1 week ago

Hyderabad, Telangana, India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000

Reliability SpecialistWe seek a skilled Reliability Specialist to ensure the reliability, availability, and performance of our trading platforms and infrastructure.The ideal candidate will have experience in designing, building, and maintaining resilient systems that meet the demands of fast-paced trading environments.Key Responsibilities:Ensure system...
Data Platform Specialist

2 weeks ago

Hyderabad, Telangana, India beBeeData Full time ₹ 8,00,000 - ₹ 12,00,000

Data Platform Specialist:We're seeking an innovative Data Platform Specialist to join our cutting-edge AI team. As a key member, you'll be responsible for designing and implementing agent-driven data platforms that revolutionize how teams interact with infrastructure, analysis, and observability.With your expertise in modern data engineering, you'll tackle...
High-Performance Data Platforms Engineer

2 weeks ago

Hyderabad, Telangana, India beBeeBackend Full time ₹ 20,00,000 - ₹ 25,00,000

Job DescriptionWe are seeking a skilled Backend Engineer to join our team. The ideal candidate will have 6+ years of experience in modern programming languages, with a strong foundation in distributed systems, scalability, and availability.ResponsibilitiesDesign and implement scalable backend systems that process and manage large volumes of customer...
High Availability Systems Specialist

2 weeks ago

Hyderabad, Telangana, India beBeeSiteReliabilityEngineer Full time ₹ 15,00,000 - ₹ 20,00,000

Job OverviewWe are seeking an experienced Site Reliability Engineer to ensure the availability, scalability and performance of critical systems and services. Key responsibilities include:• Designing, developing, and deploying reliable and scalable systems and services.• Collaborating with cross-functional teams to identify and prioritize technical...
Data Platform Specialist

1 week ago

Hyderabad, Telangana, India beBeeObservability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

About the RoleAs a Data Platform Observability Lead, you will play a critical role in establishing and advancing observability practices across data platforms.Key ResponsibilitiesDesign comprehensive observability frameworks to monitor performance, reliability, and availability of data platforms and services.Define key metrics and track their progress across...

Americas

Europe

Asia / Oceania

Africa

High Availability Platform Specialist