
Senior System Reliability Engineer
21 hours ago
The Insight Global team is hiring a full-time Monitoring Engineer to join the LLM Proxy Team. This role involves monitoring system health via Grafana dashboards, managing incident communications, and ensuring high reliability of globally deployed web applications.
Key Responsibilities:- Monitor Grafana dashboards and observability tools to detect failures and performance issues.
- Act as the primary SRE for incident response, initiating reports from automated alerts or joining active incident channels.
- Serve as the main point of contact during incidents, delivering frequent updates to customers and incident commanders.
- Interpret operational metrics such as Quantiles, P99, and Prometheus data to assess system health.
- Track and manage permutations of a globally deployed microservices architecture running on Kubernetes.
- Collaborate with engineering and support teams to resolve issues quickly and efficiently.
- Maintain strong communication and customer service throughout incident lifecycles.
- Utilize foundational knowledge of cloud platforms to support infrastructure monitoring.
- 3+ years of experience monitoring and responding to incidents in a globally deployed web application.
- Strong experience with microservices architecture on Kubernetes.
- Deep understanding of observability tools and operational metrics.
- Familiarity with cloud services or any major cloud provider.
- Excellent communication and customer service skills.
- Ability to ramp up quickly, take ownership, and work independently in a fast-paced environment.
-
Senior Reliability Engineer Position
21 hours ago
Kanpur, Uttar Pradesh, India beBeeEngineer Full time ₹ 23,00,000 - ₹ 25,00,000Reliability Engineering Leadership Role">We are seeking a seasoned Reliability Engineer to lead our team's efforts in ensuring the availability and performance of our systems. As a technical leader, you will be responsible for solving complex production issues, guiding development teams, and building tools that improve system resilience and...
-
Reliability Engineering Specialist
4 days ago
Kanpur, Uttar Pradesh, India beBeeEngineering Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Title: Reliability Engineering SpecialistWe are seeking an experienced professional to join our Platform Engineering team as a Reliability Engineering Specialist. The ideal candidate will have a strong background in software engineering and systems operations, with expertise in building infrastructure that powers AI-driven code reviews at scale.Main...
-
Kanpur, Uttar Pradesh, India beBeeReliability Full time ₹ 1,20,00,000 - ₹ 2,00,00,000Job OpportunityWe are seeking experienced professionals to fill the role of a Senior Reliability Engineer with expertise in AWS Serverless technologies.Xebia's Cloud & DevOps practice is expanding, and we require talented individuals to design and implement resilient, fault-tolerant AWS architectures.Apply SRE principles (SLIs, SLOs, SLAs, error budgets) to...
-
Reliable Systems Expert
20 hours ago
Kanpur, Uttar Pradesh, India beBeeResponsibility Full time ₹ 18,00,000 - ₹ 26,40,000Job OverviewThis is a key position for a skilled Site Reliability Engineer to join our team.Experience working with microservices on a Kubernetes background and possessing a strong understanding of observability tools and metrics.
-
AI Systems Engineer
2 hours ago
Kanpur, Uttar Pradesh, India beBeeSpecialist Full time ₹ 1,80,00,000 - ₹ 2,20,00,000Senior Technical Specialist Role OverviewWe specialize in designing and implementing cutting-edge fulfillment technology that streamlines the delivery of products to their intended destinations.Our innovative solutions transform how businesses fulfill orders, ensuring timely and reliable customer experiences.Job Description:As a Senior Technical Specialist,...
-
AI/ML System Reliability Engineer
23 hours ago
Kanpur, Uttar Pradesh, India beBeeSiteReliability Full time ₹ 13,04,000 - ₹ 26,12,000Transform Your Career with AI/ML Site ReliabilityWe seek an experienced professional to ensure the reliability and scalability of cloud-based AI/ML systems.Key Responsibilities:Design, implement, and maintain scalable and reliable Azure infrastructure (storage, networking, security, IAM)Collaborate with cross-functional teams to develop and deploy Databricks...
-
Senior Data Engineering Lead
2 days ago
Kanpur, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job DescriptionWe seek an experienced data engineer to lead the design and development of large-scale data processing systems that enable advanced analytics and business intelligence.As a senior data engineer, you will be responsible for architecting and implementing scalable and robust data pipelines that collect, process, and store large volumes of...
-
Highly Skilled Site Reliability Engineer
3 minutes ago
Kanpur, Uttar Pradesh, India beBeeDevOps Full time ₹ 18,00,000 - ₹ 24,00,000DevOps Engineer with Site Reliability EngineeringThis role is designed for a highly skilled and experienced DevOps engineer who possesses expertise in site reliability engineering. The ideal candidate will be able to work effectively in a hybrid environment, leveraging their 7-10 years of professional experience in the field.
-
Sr. Lead Site Reliability Engineer – Technical
3 weeks ago
Kanpur, Uttar Pradesh, India Shell Recharge Solutions Full timeShell Recharge Solutions is looking for a Sr. Lead Site Reliability Engineer + People/ Team management to join our team. We would like to find a highly engaged engineer who is obsessed with monitoring, observability, code quality and self-healing infrastructures with Team management You should be able to identify, troubleshoot, and resolve issues quickly...
-
Senior Software Systems Developer
2 days ago
Kanpur, Uttar Pradesh, India beBeeBackend Full time ₹ 2,00,00,000 - ₹ 2,40,00,000Position Overview:We are seeking a skilled Senior Backend Engineer to join our team. The ideal candidate will have expertise in building and maintaining high-quality backend systems using TypeScript/Node.js.The role involves designing, implementing, and deploying scalable backend services that handle financial workflows such as KYC, servicing, and...