SRE Devops Lead
2 days ago
We are looking for Site Reliability/Cloud Engineer Devops Lead / SSEExperience - 6 years - 12 yearsCan join immediate - 30 daysShift timing: RegularLocation: Bangalore / Hyderabad / Chennai / Noida / Pune / Gurgaon / VisakhapatnamInterested candidates, Please share your profiles and below details toEmail ID: Shanmukh.Varma@infinite.comTotal experience:Relevant Experience:Current CTC:Expected CTC:Notice Period:If Serving Notice Period, Last working day:Email ID: Shanmukh.Varma@infinite.comJob Title: Site Reliability/Cloud EngineerJob Type: Full-timeDepartment: EngineeringJob SummaryWe're seeking a motivated, and passionate Site Reliability Engineering (SRE) leader with strong expertise in programming, distributed systems, and Kubernetes. In this role, you'll help evolve our SRE team's Kubernetes and microservices architecture, while also supporting the integration of Agentic AI workloads both within Kubernetes and via managed services.The SRE function plays a critical role in maintaining system visibility, ensuring platform scalability, and enhancing operational efficiency. As part of this, you'll help drive AIOps initiatives, leveraging AI tools and automation to proactively detect, diagnose, and remediate issues, enhancing the reliability and performance of Zyter’s global platform. As a cloud practictioner, you’ll have the opportunity to apply your technical strengths, shape platform reliability strategies, and collaborate closely with engineering teams across the organization. You’ll work as part of a globally distributed, inclusive team focused on AWS-based cloud infrastructure.Key ResponsibilitiesCore SRE:Collaborate with development teams, product owners, and stakeholders to define, enforce, and track SLOs and manage error budgets.Improve system reliability by designing for failure, testing edge cases, and monitoring key metrics.Boost performance by identifying bottlenecks, optimizing resource usage, and reducing latency across services.Build scalable systems that handle growth in traffic or data without compromising performance.Stay directly involved in technical work, contributing to the codebase and leading by example in solving complex infrastructure challengesAI Ops:Design and implement scalable deployment strategies optimized for large language models like, Llama, Claude, Cohere and others.Set up continuous monitoring for model performance, ensuring robust alerting systems are in place to catch anomalies or degradation.Stay current with advancements in MLOps and Generative AI, proactively introducing innovative practices to strengthen AI infrastructure and delivery. Monitoring and Alerting:Set up monitoring and observability using Prometheus, Grafana, CloudWatch, and logging with OpenSearch/ELK Proactively identify and resolve issues by leveraging monitoring systems to catch early signals before they impact operations.Design and maintain alerting mechanisms that are clear, actionable, and tuned to avoid unnecessary noise or alert fatigue.Continuously improve system observability to enhance visibility, reduce false positives, and support faster incident response.Apply best practices for alert thresholds and monitoring configurations to ensure reliability and maintain system health.Cost Management:Monitor infrastructure usage to identify waste and reduce unnecessary spending.Optimize resource allocation by using right-sized instances, auto-scaling, and spot instances where appropriate.Implement cost-aware design practices during architecture and deployment planning.Track and analyze monthly cloud costs to ensure alignment with budget and forecast.Collaborate with teams to increase cost visibility and promote ownership of cloud spend.Required Skills & Experience:Strong experience as SRE with a proven track record of managing large-scale, highly available systems.Knowledge of core operating system principles, networking fundamentals, and systems management.Strong understanding of cloud deployment and management practicesHands-on experience with Terraform/OpenTofu, Helm, Docker, Kubernetes, Prometheus and IstioHands-on experience with tools and techniques to diagnose and uncover container performanceSkilled with AWS services both from technology and cost perspectivesSkilled in DevOps/SRE practices and build/release pipelinesExperience working with mature development practices and tools for source control, security, and deploymentHands on experience with Python/Golang/Groovy/JavaExcellent communication skills, written and verbalStrong analytical and problem-solving skillsPreferred QualificationsExperience scaling Kubernetes clusters and managing ingress traffic.Familiarity with multi-environment deployments and automated workflows.Knowledge of AWS service quotas, cost optimization, and networking nuances.Strong troubleshooting skills and effective communication across teams.Prior experience in regulated environments (HIPAA, SOC2, ISO27001) is a plus
-
DevOps / SRE with Python
1 day ago
bangalore, India Bahwan Cybertek Group Full timeWe are looking for a talented DevOps / SRE Engineer with strong Python skills to join our team at Bahwan Cybertek Group. As a DevOps / SRE Engineer, you will be responsible for maintaining and improving our software development and deployment processes, as well as ensuring the reliability and scalability of our infrastructure.Responsibilities:- Develop and...
-
SRE / DevOps Platform Engineer
1 week ago
Bangalore, India Prospance Inc Full timeSRE & DevOps Engineer (ML/AI Platform) Contract Position | Global E-Commerce Leader | Hybrid We're partnering with a leading global e-commerce company to find an exceptional SRE & DevOps Engineer to join their AI Platform Team. This is your chance to shape the future of machine learning infrastructure that powers innovation for millions of users worldwide....
-
DevOps Engineer/SRE
2 days ago
bangalore, India SuprSend Full timeAbout Us:SuprSend is reinventing notification infrastructure for global businesses. Powering seamless, reliable distribution of millions of events across channels. Join us as we scale further and raise the bar on uptime, cost-efficiency and automation.Role Snapshot:We’re seeking an experienced DevOps / SRE engineer with deep Kubernetes and cloud-native...
-
DevOps Engineer/SRE
10 hours ago
Bangalore, India SuprSend Full timeAbout Us: SuprSend is reinventing notification infrastructure for global businesses. Powering seamless, reliable distribution of millions of events across channels. Join us as we scale further and raise the bar on uptime, cost-efficiency and automation. Role Snapshot: We’re seeking an experienced DevOps / SRE engineer with deep Kubernetes and cloud-native...
-
Senior DevOps Engineer
2 days ago
bangalore, India MightyBot Full timeTitle: Senior DevOps Engineer (SRE) Location: Remote Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the robust, scalable infrastructure that makes this possible, ensuring our platform is reliable enough to be trusted with critical business...
-
SRE & DevOps Engineer (Node.js )
1 week ago
Bangalore, India Prospance Inc Full timeAbout the Opportunity We're partnering with a leading global e-commerce company to find an exceptional SRE & DevOps Engineer with strong Node.js and UI development expertise. Join their AI Platform Team and build the developer-facing tools and infrastructure that empower researchers and data scientists worldwide. In this unique role, you'll bridge backend...
-
Senior DevOps Engineer
11 hours ago
bangalore, India MightyBot Full timeTitle: Senior DevOps Engineer (SRE) Location: Remote Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the robust, scalable infrastructure that makes this possible, ensuring our platform is reliable enough to be trusted with critical business...
-
SRE Devops Lead
10 hours ago
Bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability/Cloud Engineer Devops Lead / SSE Experience - 6 years - 12 years Can join immediate - 30 days Shift timing: Regular Location: Bangalore / Hyderabad / Chennai / Noida / Pune / Gurgaon / Visakhapatnam Interested candidates, Please share your profiles and below details to Email ID: Total experience: Relevant Experience:...
-
SRE Devops Manager
10 hours ago
Bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability Engineering (SRE) Devops Manager Location: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / Gurgaon Shift timing: regular Can join Immediate - 30 days Interested candidates, Please share your profiles and below details to Email ID: Total experience: Relevant Experience: Current CTC: Expected CTC: Notice...
-
Senior Sre Advanced
4 days ago
Bangalore, Karnataka, India EMBARKGCC SERVICES PRIVATE LIMITED Full timeAbout the Role We are seeking a Site Reliability Engineer SRE - DevOps professional to design automate and maintain reliable scalable and high-performing systems The ideal candidate will contribute directly to enhancing technology capabilities and business performance through innovative sustainable DevOps and reliability practices Key Responsibilities Design...