
Reliability Expert
1 week ago
We are seeking a skilled Technical Engineer to contribute to our Developer Experience team, focusing on designing and implementing high-performance services. This role requires expertise in ensuring the reliability and scalability of our Contact Center service.
This position is crucial in promoting a DevOps culture, where engineering teams are responsible for software development and deployment. As a Technical Engineer, you will play a key part in providing tools and expertise needed to achieve this goal in a collaborative environment.
The mission is to enhance developers' experience by equipping them with necessary tools to manage the entire software lifecycle and be self-sufficient. To support this objective, we are building an internal PaaS using advanced technologies like Kubernetes, Prometheus, Kotlin, and others.
This platform serves as a vital component of our engineering effort, enabling us to deliver better, faster, and more reliable solutions for our customers.
Main Responsibilities:- Design, implement, and maintain critical parts of our internal platform, from CI/CD to developer tools aimed at increasing R&D productivity.
- Assist with migrating to industry-leading CICD tools like GitHub Actions.
- Automate safe deployment practices using industry-standard tools like GitHub Actions, ArgoCD, Argo Rollouts, Helm Charts, etc.
- Develop automation scripts to streamline infrastructure provisioning and other engineering processes.
- Coach and upskill other engineering team members.
- Solve complex technical problems and put your skills to the test every day, seeing an immediate impact of your work and value created for other engineers.
- Implement automated workflows to minimize human intervention.
- Develop effective tooling, alerts, and response strategies to identify and address reliability risks.
- Drive and promote protocols for production readiness and operational excellence.
- Collaborate with product engineering teams to troubleshoot production outages and implement improvements.
- Promote best practices for automated testing, continuous integration, delivery, feature toggles, and progressive rollouts.
- Plan for the growth of our infrastructure.
- 5-8 years of experience.
- Deep understanding of large-scale complex systems from a reliability perspective.
- Expertise in designing, implementing, and maintaining CI/CD processes and tools.
- Passion for producing clean, standards-compliant, secure code.
- A developer's mindset when approaching infrastructure challenges.
- Experience with Linux/Unix systems.
- Knowledge of Kubernetes.
- Experience with Infrastructure as Code tools like Terraform and Ansible.
- Experience building software with languages like Java, Kotlin, Scala, or other JVM-based languages.
- Scripting skills with languages like Ruby, Python, Bash, or other scripting languages.
- Hands-on experience with relational and non-relational databases (e.g., PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch).
- Able to identify time-consuming tasks and automate them effectively.
- Skilled in identifying root causes of instability in distributed systems across various stacks.
- Cloud-based solution experience (Amazon AWS, Google Cloud, Microsoft Azure).
- CICD platform expertise (Jenkins, GitLab), Containers (Docker, Kubernetes), Artifact Management tools (Nexus, ECR).
- Go programming language experience.
-
Reliability Expert Position
3 days ago
Hyderabad, Telangana, India beBeeOperational Full time ₹ 15,00,000 - ₹ 25,00,000Reliability Expert PositionJob Summary: We are seeking an experienced Reliability Expert to join our team.About Us: Our organization values reliability and efficiency in all aspects of our operations.Key Responsibilities:Develop and implement strategies to improve system reliability and availability.Collaborate with cross-functional teams to identify and...
-
Senior Site Reliability Expert
3 days ago
Hyderabad, Telangana, India beBeeSite Full time ₹ 2,24,00,000 - ₹ 3,51,20,000About Our Senior Site Reliability ExpertThe role of a senior site reliability expert is pivotal in ensuring the stability, scalability, and operational excellence of accounting and finance systems.Key ResponsibilitiesOperational Oversight: As a senior site reliability expert, you will be responsible for overseeing day-to-day operations for accounting and...
-
Expert Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India beBeePerformance Full time ₹ 1,60,00,000 - ₹ 2,20,00,000Job Title:Reliability ExpertRole Overview:Empowering users with rich features, high availability, and stellar performance is at the core of our software solutions. We rely on reliability engineers to drive innovation forward.Key Responsibilities:Develop software and systems for managing infrastructure and applications.Run the production environment by...
-
Expert in Cloud Reliability
7 days ago
Hyderabad, Telangana, India beBeeCloudReliability Full time ₹ 15,00,000 - ₹ 25,00,000Job Role SummaryWe are seeking a seasoned professional to fill the role of Cloud Reliability Expert. The ideal candidate will have extensive experience in DevOps and related fields.Mastery of on-premise migration, including setup and management of DevOps tools (Github, Team city, Jenkins, Jira, Confluence).Familiarity with infrastructure assessment, data...
-
Expert in System Reliability
3 days ago
Hyderabad, Telangana, India beBeeEngineering Full time ₹ 20,00,000 - ₹ 25,00,000Job Title:Site Reliability ExpertRole Summary:We are seeking a skilled Site Reliability Engineer to ensure the reliability and efficiency of our systems.Key Responsibilities:Identify potential system issues early and implement preventive measures to boost system resilience.Automate tasks using tools and scripts to eliminate manual effort and enable rapid...
-
Site Reliability Professional
2 weeks ago
Hyderabad, Telangana, India beBeeReliability Full time ₹ 12,00,000 - ₹ 25,00,000Reliable System ExpertWe are seeking a seasoned professional to fill the role of Reliable System Expert. This position will play a pivotal part in ensuring our systems maintain high levels of reliability, performance, and scalability.The ideal candidate will have a strong background in Observability, Troubleshooting, and IT Platform Management.This is an...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India beBeeObservability Full time US$ 1,50,000 - US$ 2,00,000Site Reliability Engineer - Observability ExpertWe are seeking a highly skilled Site Reliability Engineer to join our team. As an Observability expert, you will design and develop next-generation observability platforms that enable our clients to monitor and improve their complex IT systems.The ideal candidate will have a strong background in software...
-
Platform Reliability Expert
2 weeks ago
Hyderabad, Telangana, India beBeeObservability Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Observability LeadThe Data Platforms Observability Lead is responsible for establishing and advancing monitoring, reliability practices, and observability across the Enterprise Data & Analytics (EDAA) landscape.This role ensures end-to-end visibility into platform performance, data pipeline health, system availability, and operational SLAs.With a deep...
-
Site Reliability Expert
7 days ago
Hyderabad, Telangana, India beBeeResponsibilities Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Title:Achieving System Excellence About the Role:We are seeking a skilled Site Reliability Engineer to join our team. The ideal candidate will have 5+ years of experience in DevOps and Site Reliability Engineering, with a strong focus on ensuring smooth system operations. Key Responsibilities:Design, implement, and maintain scalable systems using...
-
Reliability Specialist
2 days ago
Hyderabad, Telangana, India beBeeAccountability Full time ₹ 9,00,000 - ₹ 18,00,000Job DescriptionWe are seeking a highly skilled technical expert to join our team in a critical role.As a Site Reliability Engineer - Accounting Technology, you will be responsible for ensuring the performance, reliability, and uptime of accounting and finance platforms. This includes building automation for deployments, monitoring, scaling, and self-healing...