Sre devops manager
1 week ago
We are looking for Site Reliability Engineering (SRE) Devops ManagerLocation: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / GurgaonShift timing: regularCan join Immediate - 30 daysInterested candidates, Please share your profiles and below details toEmail ID: Shanmukh.Total experience:Relevant Experience:Current CTC:Expected CTC:Notice Period:If Serving Notice Period, Last working day:Email ID: Shanmukh.Job SummaryWe are seeking an experienced Site Reliability Engineering (SRE) Manager to lead and evolve our cloud infrastructure, reliability practices, and automation strategy. This role blends hands-on technical leadership with strategic oversight to ensure scalable, secure, and reliable systems across AWS-based environments.As an SRE Manager, you will guide a team of Dev Ops and SRE engineers to design, build, and operate cloud-native platforms leveraging Kubernetes (EKS), Terraform, and AWS Dev Ops tools. You will drive operational excellence through observability, automation, and AIOps—enhancing reliability, performance, and cost efficiency.You will collaborate closely with development, product, and security teams to define SLOs, manage error budgets, and continuously improve infrastructure resilience and developer productivity.Key ResponsibilitiesLeadership & Strategy- Lead, mentor, and grow a global team of Site Reliability and Dev Ops Engineers.- Define and drive the reliability roadmap, SLOs, and error budgets across services.- Establish best practices for infrastructure automation, observability, and incident response.- Partner with engineering leadership to shape long-term cloud, Kubernetes, and AIOps strategies.Infrastructure & Automation- Design, implement, and manage AWS cloud infrastructure using Terraform (advanced modules, remote state management, custom providers).- Build and optimize CI/CD pipelines using AWS Code Pipeline, Code Build, Code Deploy, and Code Commit.- Manage EKS clusters with focus on scalability, reliability, and cost efficiency—leveraging Helm, ingress controllers, and service mesh (e.g., Istio).- Implement robust security and compliance practices (IAM policies, network segmentation, secrets management).- Automate environment provisioning for dev, staging, and production using Infrastructure as Code (Ia C).Monitoring, Observability & Reliability- Lead observability initiatives using Prometheus, Grafana, Cloud Watch, and Open Search/ELK.- Improve system visibility and response times by enhancing monitoring, tracing, and alerting mechanisms.- Drive proactive incident management and root cause analysis (RCA) to prevent recurring issues.- Apply chaos engineering principles and reliability testing to ensure resilience under load.AIOps & Advanced Operations- Integrate AIOps tools to proactively detect, diagnose, and remediate operational issues.- Design and manage scalable deployment strategies for AI/LLM workloads (e.g., Llama, Claude, Cohere).- Monitor model performance and reliability across hybrid Kubernetes and managed AI environments.- Stay current with MLOps and Generative AI infrastructure trends, applying them to production workloads.- Manage 24/7 operations using apropos alerting tools and follow-the-sun modelCost Optimization & Governance- Analyze and optimize cloud costs through instance right-sizing, auto-scaling, and spot usage.- Implement cost-aware architecture decisions and monitor monthly spend for alignment with budgets.- Establish cloud governance frameworks to enhance cost visibility and accountability across teams.Collaboration & Process- Partner with developers to streamline deployment workflows and improve developer experience.- Maintain high-quality documentation, runbooks, and postmortem reviews.- Foster a culture of reliability, automation, and continuous improvement across teams.
-
SRE (Devops)
4 days ago
bangalore, India Cozzera Full timePosition: SRE / DevOps EngineerExperience: 6+ yearsLocation: RemoteRequired Skills & Experience:6+ years as an SRE or Senior Systems Engineer managing critical production systems.4+ years managing self-hosted, enterprise GitLab or similar SCM platforms.Strong Linux (RHEL) administration expertise.Advanced hands-on experience with Ansible and Terraform...
-
SRE (Devops)
4 days ago
bangalore, India Cozzera Full timePosition: SRE / DevOps Engineer Experience: 6+ years Location: Remote Required Skills & Experience: 6+ years as an SRE or Senior Systems Engineer managing critical production systems. 4+ years managing self-hosted, enterprise GitLab or similar SCM platforms. Strong Linux (RHEL) administration expertise. Advanced hands-on experience with Ansible and Terraform...
-
SRE (Devops)
3 days ago
bangalore, India COZZERA INTERNATIONAL LLP Full timePosition: SRE / DevOps Engineer Experience: 6+ years Location: Remote Required Skills & Experience: 6+ years as an SRE or Senior Systems Engineer managing critical production systems. 4+ years managing self-hosted, enterprise GitLab or similar SCM platforms. Strong Linux (RHEL) administration expertise. Advanced hands-on experience with Ansible and Terraform...
-
SRE (Devops)
3 weeks ago
bangalore, India Cozzera Full timeRole: Senior SRE Devops Shifts: Night Shift Location: Remote Key Responsibilities: Manage and optimize cloud infrastructure with strong hands-on expertise in AWS , Kubernetes , and Terraform . Automate deployment pipelines and ensure high availability and scalability of services. Troubleshoot production issues and provide on-call support during night shift....
-
SRE Devops Manager
4 days ago
bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability Engineering (SRE) Devops ManagerLocation: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / GurgaonShift timing: regularCan join Immediate - 30 daysInterested candidates, Please share your profiles and below details toEmail ID: Shanmukh.Varma@infinite.comTotal experience:Relevant Experience:Current...
-
SRE / DevOps Platform Engineer
2 weeks ago
Bangalore, India Prospance Inc Full timeSRE & DevOps Engineer (ML/AI Platform) Contract Position | Global E-Commerce Leader | Hybrid We're partnering with a leading global e-commerce company to find an exceptional SRE & DevOps Engineer to join their AI Platform Team. This is your chance to shape the future of machine learning infrastructure that powers innovation for millions of users worldwide....
-
SRE Devops Manager
3 days ago
bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability Engineering (SRE) Devops Manager Location: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / Gurgaon Shift timing: regular Can join Immediate - 30 days Interested candidates, Please share your profiles and below details to Email ID: Total experience: Relevant Experience: Current CTC: Expected CTC: Notice...
-
SRE Devops Manager
7 hours ago
bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability Engineering (SRE) Devops Manager Location: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / Gurgaon Shift timing: regular Can join Immediate - 30 days Interested candidates, Please share your profiles and below details to Email ID: Total experience: Relevant Experience: Current CTC: Expected CTC: Notice...
-
SRE & DevOps (Ray.io)
4 days ago
bangalore, India AMISEQ Full timeSRE & DevOps (Ray.io)Bengaluru, KARequired Skills:● Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++. ● Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting. ● Ability to use Ray Dashboard and CLI tools for monitoring,...
-
SRE & DevOps (Ray.io)
4 days ago
bangalore, India AMISEQ Full timeSRE & DevOps (Ray.io) Bengaluru, KA Required Skills: ● Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++. ● Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting. ● Ability to use Ray Dashboard and CLI tools for...