Senior Cloud Reliability Engineer
1 month ago
The QpiAI team is a leading innovator in Artificial Intelligence, pushing the boundaries of what is possible with AI across various industry verticals and domains. We strive to deliver exceptional innovative products that consistently improve our customers' experience.
About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team. The ideal candidate will have a deep understanding of cloud infrastructure, observability, automation, and the tools & practices needed to support AI workloads and build highly reliable systems.
Job Description- Observability & Monitoring:
- Design, implement, and maintain observability pipelines using tools like Prometheus, Loki, Elasticsearch, Fluentbit/Fluentd, Logstash, Filebeat/Metricbeat, Thanos, Vector.dev, Jaeger, Tempo, Grafana Agent, or Grafana Alloy. Proficiency in at least one stack is a must; familiarity with additional stacks is a bonus.
- Build effective Grafana/Kibana dashboards, set up metrics, logs, and trace collection, and create alerts for AI applications.
- Ensure observability across AI workloads, containers, and services to guarantee high availability and robust monitoring.
- Troubleshoot and resolve infrastructure and application issues, ensuring quick detection and resolution.
- Automation & Infrastructure as Code (IaC):
- Automate infrastructure setups using Terraform, Ansible, Tekton, and Helm charts to ensure efficient, consistent, and repeatable deployments.
- Support Kubernetes-based infrastructure, enabling auto-scaling, fault tolerance, and high availability for AI applications and microservices.
- CI/CD & DevOps:
- Implement CI/CD pipelines using tools such as Jenkins, Drone, GitLab CI, Tekton, ArgoCD, or FluxCD (Experience with at least one CI tool and one CD tool is required; proficiency in additional CI/CD tools is a plus).
- Build and maintain scalable, secure, and efficient pipelines for continuous integration, deployment, and monitoring.
- Collaborate with developers to improve the development lifecycle and reduce time-to-deployment.
- Ai Workload Support & Optimization:
- Support AI frameworks like PyTorch, Flyte, Kubeflow, Ray, and KubeRay (with a strong focus on PyTorch and Flyte).
- Optimize Kubernetes clusters for resource-efficient AI workload management, monitoring, and scaling.
- Collaborate with data scientists and engineers to deploy and monitor large-scale AI models in production environments.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 3+ years of experience in SRE, Platform Engineering, or DevOps roles, with a strong focus on Kubernetes and observability tools.
- Technical Skills:
- Strong expertise in Kubernetes, container orchestration, and scaling strategies.
- Hands-on experience with observability tools such as Prometheus, Loki, Elasticsearch, Grafana, Thanos, Fluentbit, Jaeger, and OpenTelemetry.
- Proficiency in CI/CD tools like Jenkins, GitLab CI, ArgoCD, Tekton, and FluxCD.
- Expertise in infrastructure automation tools like Terraform, Ansible, and Helm.
- Familiarity with AI frameworks (e.g., PyTorch, Flyte, Kubeflow) and their integration with Kubernetes environments.
- Scripting experience with Python, Bash, and YAML for automation and configuration management.
$120,000 - $180,000 per annum, depending on location and experience.
-
Senior Cloud Reliability Engineer
1 month ago
Bengaluru, Karnataka, India Groww Full timeAbout UsAt Groww, we are committed to making financial services accessible to every Indian through a multi-product platform. Our team is passionate about creating an exceptional experience for our customers, with a focus on customer obsession and customer-centricity.Job DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our...
-
Senior Cloud Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India Synechron Full timeAbout SynechronWe are a global digital consulting firm that provides innovative technology solutions for businesses. Our expertise in AI, Consulting, Data, Digital, Cloud & DevOps and Software Engineering delivers customized, end-to-end solutions that drive business value and growth.We began life in 2001 as a small team of technology specialists and have...
-
Cloud-Native Reliability Architect
2 weeks ago
Bengaluru, Karnataka, India Laerdal Bangalore Full timeAbout the RoleWe are seeking an experienced Senior Site Reliability Engineer to join our team at Laerdal Bangalore. As a Senior Site Reliability Engineer, you will play a pivotal role in ensuring the reliability and performance of our cloud-based applications and solutions.
-
Senior Software Engineer
2 days ago
Bengaluru, Karnataka, India Hireginie Talent Cloud Pvt Ltd Full timeAbout UsHireginie Talent Cloud Pvt Ltd is a leading technology firm based in Bangalore.Job Title: Senior Software Engineer - Cloud ArchitectLocation: BangaloreSalary Range: ₹2000000 - ₹3500000 per annumJob DescriptionWe are seeking an experienced Senior Software Engineer - Cloud Architect to join our team. The successful candidate will be responsible for...
-
Senior Cloud Reliability Architect
4 weeks ago
Bengaluru, Karnataka, India Watson Search Partner Full timeAbout the Role">We are seeking a seasoned Senior Cloud Reliability Architect to join our team at Watson Search Partner. In this role, you will be responsible for designing and delivering cloud-native infrastructure solutions on top of Public Cloud or similar private cloud platforms. ">Key Responsibilities"> Build software and systems to manage platform...
-
Reliability Engineering Expert
2 weeks ago
Bengaluru, Karnataka, India Zscaler Full timeCloud Security Platform EngineerZscaler is a leader in cloud security and we're looking for a skilled Senior Manager, Site Reliability Engineer to join our SRE, Platform & Tooling team. As a member of our Engineering team, you will be responsible for designing, implementing, and managing scalable and reliable infrastructure solutions to support Zscaler's...
-
Reliability Engineer for Cloud Infrastructure
2 weeks ago
Bengaluru, Karnataka, India Practo Technologies Pvt Ltd Full timeAt Practo Technologies Pvt Ltd, we strive to simplify healthcare and make quality care accessible to everyone. As a leading digital healthcare platform, we connect millions of patients with healthcare providers, making healthcare services more efficient.We are seeking a skilled Site Reliability Engineer to maintain the reliability, performance, and...
-
Cloud Storage Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India NetApp Full timeJob OverviewWe are seeking an experienced Cloud Storage Reliability Engineer to join our team at NetApp. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for ensuring the availability, performance, and security of customer-facing cloud services on Google Cloud Platform (GCP).The ideal candidate will have extensive...
-
Cloud Infrastructure Reliability Engineer
1 month ago
Bengaluru, Karnataka, India Taggd Full timeWe are seeking a seasoned Cloud Infrastructure Reliability Engineer to join our team at Taggd. Located in Bangalore, this role offers an exciting opportunity to work on high-availability and scalability of our cloud-based systems.With 8 years of experience in Development and Operations of applications/services in production with uptime over 99.9%, the...
-
Senior Cloud Engineer
2 weeks ago
Bengaluru, Karnataka, India Card91 Full timeJob DescriptionWe're looking for a highly skilled Senior DevOps Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our cloud-based infrastructure. Your expertise in AWS will enable us to scale our deployments and improve service deployment reliability and speed.Our...
-
Cloud Reliability Engineer Lead
3 weeks ago
Bengaluru, Karnataka, India myGwork Full time**About the Role:**We are seeking a seasoned Cloud Reliability Engineer Lead to join our team in Bangalore, India. As a key member of our Developer Experience team, you will play a vital role in empowering developers to leverage AWS services effectively in their SRE and DevOps practices.
-
Cloud Infrastructure Reliability Specialist
3 days ago
Bengaluru, Karnataka, India Banyan Cloud Full timeAbout UsBanyan Cloud, a wholly owned subsidiary of Banyan Cloud, USA, is a Cyber Security Product Company headquartered in San Jose, California, USA.We are looking for candidates who aspire to be part of cutting-edge solutions and services addressing next-generation technological challenges.This role involves building and maintaining highly scalable,...
-
Cloud Engineering Lead
4 days ago
Bengaluru, Karnataka, India Cloud Counselage Pvt Ltd Full timeWe are looking for a highly skilled Cloud Engineering Lead to join our team at Cloud Counselage Pvt Ltd in Bangalore, Karnataka. As a key member of our organization, you will be responsible for overseeing the development and implementation of cloud-based technologies, ensuring seamless integration with existing systems, and driving innovation through...
-
Cloud Native Systems Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India Traceable Full timeAbout UsAt Traceable, we are passionate about building ultra-modern infrastructure that enables entire engineering and product teams to be highly productive and agile.Salary and BenefitsThe estimated salary for this position is $160,000 - $200,000 per year, depending on experience. We offer a comprehensive benefits package, including medical, dental, and...
-
Senior Cloud Engineer Position
2 weeks ago
Bengaluru, Karnataka, India NetAnalytiks Technologies Full timeSenior Cloud Engineer PositionJob Summary:An exciting opportunity exists for a Senior Cloud Engineer to join our team as a key contributor. The successful candidate will be responsible for designing, developing, and deploying innovative cloud-based solutions using Microsoft Azure's serverless technologies.Key Responsibilities:Design and implement efficient...
-
Site Reliability Engineer and Cloud Architect
2 weeks ago
Bengaluru, Karnataka, India Practo Technologies Pvt Ltd Full timePracto TechnologiesWe are seeking an experienced Site Reliability Engineer to join our team at Practo Technologies Pvt Ltd. The successful candidate will play a critical role in maintaining the reliability, performance, and scalability of our services.The ideal candidate will have a strong background in cloud providers such as AWS, Azure, and Oracle. They...
-
Senior Cloud Engineer
2 weeks ago
Bengaluru, Karnataka, India Winning Edge Full timeJob DescriptionWe are looking for a talented and driven individual with strong problem-solving skills, experience in cloud computing, and a passion for delivering high-quality solutions at Winning Edge. As a Senior Cloud Infrastructure Engineer, you will play a key role in designing, building, and maintaining scalable and secure cloud infrastructure on...
-
Cloud Infrastructure Engineer
3 weeks ago
Bengaluru, Karnataka, India NetApp Full timeAbout NetAppAs a global leader in intelligent data infrastructure, NetApp empowers customers to turn challenges into opportunities. Our cutting-edge solutions help organizations harness the power of data to drive innovation and growth.We're seeking a skilled Cloud Infrastructure Engineer - System Reliability to join our team. This is an exciting opportunity...
-
Cloud Infrastructure Engineer
4 weeks ago
Bengaluru, Karnataka, India Eximietas Design Full timeAt Eximietas Design, we are seeking a highly skilled Cloud Infrastructure Engineer to join our team. This role aligns with Site Reliability Engineer (SRE) or Cloud Infrastructure Engineer titles, with responsibilities in NOC/OCC operations.Job Summary:This position involves maintaining, monitoring, and enhancing cloud and network infrastructure with a focus...
-
Reliable Software Solutions Engineer
4 weeks ago
Bengaluru, Karnataka, India ViewSonic Full timeWe are seeking a highly experienced Senior Site Reliability Engineer (SRE) to join our team at View Sonic Technologies.We strive to deliver excellence in visual solutions across software, hardware, and services by empowering users with rich features, high availability, and stellar performance levels.In this role, you will lead a global team of SRE engineers...