Centific - Site Reliability Engineer - CI/CD Pipeline
3 weeks ago
Job Description :
Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.
Key Responsibilities :
Strategic Leadership & Vision :
- Lead and manage the Software Release Management function for all Data and AI products.
- Establish a centralized release management framework for AI and data products that scales with the growing product portfolio.
- Form and lead a high-performing Site Reliability Engineering (SRE) team to ensure the operational stability and performance of all AI and data-driven applications post-release.
- Collaborate with Product, Engineering and Operations teams to align release and SRE strategies with business objectives.
Release Planning & Coordination :
- Oversee the full lifecycle of software and AI model releases, from planning and coordination to post-release evaluation.
- Develop and maintain a detailed release calendar that aligns with the timelines and priorities of various product teams.
- Coordinate release activities with multiple cross-functional teams, ensuring transparent communication of dependencies, risks, and milestones.
- Ensure that all releases are integrated seamlessly into production, minimizing downtime and disruptions to end users.
Site Reliability Engineering (SRE) Team Formation :
- Hire, build, and lead the SRE team responsible for maintaining the reliability, scalability, and performance of all Data and AI products in production.
- Define the roles and responsibilities of the SRE team, ensuring clear alignment with the goals of product engineering and release management.
- Develop and implement SRE best practices, including incident response, root cause analysis, and proactive performance monitoring.
- Establish SLAs, SLOs, and SLIs (Service Level Agreements/Objectives/Indicators) to track and measure the reliability and performance of all services post-release.
- Collaborate with DevOps to ensure that automated CI/CD pipelines integrate seamlessly with SRE processes and monitoring systems.
Process Optimization & Automation :
- Lead the automation of software release processes, with an emphasis on CI/CD pipelines for AI models, data pipelines, and cloud-based AI products.
- Develop infrastructure-as-code practices to improve the scalability and reliability of AI and data systems across production environments.
- Introduce tools for version control, model governance, and monitoring for MLOps and AI model management in production.
- Continuously improve operational procedures to reduce the number of incidents and optimize recovery time.
Risk & Quality Management :
- Implement comprehensive quality assurance and validation processes to ensure that all AI models, data products, and software releases meet security, performance, and compliance requirements.
- Proactively identify and mitigate risks related to releases, AI model performance, and operational stability in production.
- Conduct post-release reviews and retrospectives to continuously improve both the release process and the reliability of products.
Collaboration & Stakeholder Management :
- Serve as the central point of contact for release management and SRE-related matters, ensuring consistent communication between engineering, product teams, and key stakeholders.
- Facilitate cross-functional collaboration to ensure that releases and operational reliability goals are met efficiently and effectively.
- Provide regular updates on release progress, system reliability, and any potential risks to executives and product leadership.
Innovation & Continuous Improvement :
- Stay up to date with the latest trends in SRE, DevOps, AI/ML, and cloud operations, incorporating new tools and practices to improve the overall reliability and release processes.
- Drive the adoption of cutting-edge tools in MLOps, AI model deployment, and automated incident resolution to continuously optimize operations and model lifecycle management.
- Foster a culture of continuous improvement by encouraging feedback loops and metrics-driven decision-making across both the release management and SRE teams.
Qualifications :
- Bachelor's or Master's degree in Computer Science, Data Engineering, AI/ML, or a related field.
- 10+ years of experience in software release management, with at least 3-5 years in SRE or DevOps environments, preferably in AI or data-driven applications.
- Proven experience building and managing both release management and SRE teams in complex, multi-product environments.
- Strong knowledge of AI/ML operations (MLOps), data pipeline management, and cloud-based AI product deployments.
- Expertise in release management tools (Jenkins, GitLab, Git, Jira) and SRE tools such as Prometheus, Grafana, Datadog, or similar monitoring systems.
- Experience with cloud platforms (AWS, GCP, Azure), containerization (Kubernetes, Docker), and infrastructure automation tools (Terraform, Ansible).
- Excellent problem-solving, organizational, and leadership skills, with a strong track record of driving continuous improvement in both release and operational reliability processes.
Preferred Qualifications :
- Experience deploying and maintaining large-scale AI/ML models in production environments, including monitoring, retraining, and operationalization.
- Familiarity with ITIL, MLOps, or DevOps frameworks and best practices.
- Knowledge of cloud-based services and tools specifically designed for AI/ML (e.g., AWS SageMaker, TensorFlow, PyTorch).
- Demonstrated ability to manage incident response and root cause analysis in complex software ecosystems.
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Centific Global Technologies Full timeJob Title: Site Reliability Engineer - AI/ML OperationsJob Summary:Centific Global Technologies is seeking a highly skilled Site Reliability Engineer to lead the AI/ML operations team. The ideal candidate will have a strong background in software release management, SRE, and DevOps, with experience in AI/ML operations, data pipeline management, and...
-
Centific - Azure DevOps Lead - CI/CD Pipeline
6 months ago
Chennai, Tamil Nadu, India Centific Global Technologies Full timeImmediate Joiner and Individual ContributorJob Description :Role & Responsibilities :- Collaborate with development and operations teams to define the overall DevOps strategy and roadmaps that leverage Azure DevOps tools and practices. - Design and implement CI/CD pipelines for various application types and technology stacks. Tool Selection and Configuration...
-
Site Reliability Engineer
4 hours ago
Chennai, Tamil Nadu, India Centific Global Technologies Full timeJob Title:Site Reliability Engineer - AI/ML OperationsAbout the Role:Centific Global Technologies is seeking an experienced Site Reliability Engineer to join our team and lead the development of our AI/ML operations infrastructure. This individual will be responsible for designing, building, and maintaining scalable and reliable systems for our data and AI...
-
DevOps Engineer
1 month ago
Chennai, Tamil Nadu, India Scoop Technologies Pvt Ltd Full timeJob Title : DevOps EngineerLocation : Chennai, IndiaExperience : 6+ YearsNotice Period : Immediate - 30 daysJob Description :We are seeking a skilled and experienced DevOps Engineer to join our team in Chennai. The ideal candidate will have a strong background in implementing CI/CD pipelines and be proficient with GitHub Actions. The role requires hands-on...
-
Chennai, Tamil Nadu, India Centific Full timeJob OverviewWe are seeking an experienced Reliability Engineering Lead to join our team at Centific, a Seattle-based tech company pioneering the future of AI. This is a unique opportunity to lead the development and implementation of robust and scalable AI systems.ResponsibilitiesStrategic Leadership: Lead and manage the Software Release Management function...
-
Python Software Engineer
2 weeks ago
Chennai, Tamil Nadu, India Centific Full timeJob Title: Python Software EngineerJob Summary:Centific is seeking a skilled Python Software Engineer to join our team. The ideal candidate will have strong expertise in backend development using Python and Django framework, coupled with hands-on experience in modern front-end technologies such as Angular, VueJS, or React.Key Responsibilities:Design,...
-
Chennai, Tamil Nadu, India Centific Global Technologies Full timeJob Description : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.The ideals candidate should have strong expertise in backend development using Python and Django...
-
Azure DevOps Engineer
1 month ago
Bangalore/Chennai, Tamil Nadu, India MNR Solutions Full timeSalary : 19-29lpaWe are seeking an experienced Azure DevOps Engineer to join our team and lead the development and deployment of cloud solutions. The role involves working with Azure DevOps and Google Cloud Platform (GCP) to ensure smooth and efficient operations for our cloud-based infrastructure and applications. The ideal candidate will have a strong...
-
Valor PayTech
1 month ago
Chennai, Tamil Nadu, India Valor Paytech India Private Limited Full timePosition Overview : As a DevOps Lead at Valor Paytech, you will oversee the development, implementation, and maintenance of our DevOps processes and tools. This role involves close collaboration with software developers, system operators, and IT professionals to streamline software development and deployment, automate repetitive tasks, and enhance overall...
-
Platform Engineer
4 weeks ago
Chennai, Tamil Nadu, India Talent Destination Full timeJob Description :- Administer and optimize ElasticSearch and Kibana clusters for high availability, performance, and security.- Design, deploy, and manage cloud infrastructure on AWS and Azure, optimizing cost and enhancing deployment.- Develop and implement monitoring solutions to proactively resolve issues in ElasticSearch and Kibana environments.-...
-
Performance Testing Strategist
3 days ago
Chennai, Tamil Nadu, India Centific Full timeAt Centific, we're pushing the boundaries of AI innovation. As a leading tech company in Seattle, we're committed to harnessing the power of safe and scalable AI to drive business growth.Job Overview:We're seeking an experienced Performance Testing Strategist to join our dynamic team. The ideal candidate will possess a strong background in performance...
-
Senior Python Developer
4 weeks ago
Chennai, Tamil Nadu, India Centific Full timeJob Title: Python DeveloperCentific is a global digital and technology services company that designs, builds, and optimizes human-centric, intelligent digital platforms. We are seeking a talented Python Django Full Stack Developer with 5 – 8 years of experience to join our team.Key Responsibilities:Design, develop, and maintain backend services and APIs...
-
Centific - Big Data Lead/Architect
5 months ago
Hyderabad/Chennai, Tamil Nadu, India Centific Global Technologies Full timeCentific expertly engineers platforms and curates multimodal, multilingual data to empower the 'Magnificent Seven' and enterprise clients with safe, scalable Artificial Intelligence(AI) deployment. Our team includes over 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We leverage an integrated ecosystem...
-
Centific - Enterprise Architect
4 months ago
Chennai/Hyderabad, Tamil Nadu, India Centific Global Technologies Full timeCentific is a global digital and technology services company. We design, build, and optimize human-centric, intelligent digital platforms. Our core capabilities are in data, intelligence, experience, and globalization.Immediate Joining PreferredJob Description :An Enterprise Architect primary Responsibilities performed by the role be below and more :1....
-
DevOps Engineer
3 weeks ago
Chennai, Tamil Nadu, India Hapag-Lloyd AG Full timeAbout Hapag-LloydHapag-Lloyd is a leading liner shipping company with a fleet of 287 modern container ships and a total transport capacity of 11.9 million TEU. The Company has around 13,500 employees and 400 offices in 139 countries.The FIS Core team builds the foundation of FIS3, ensuring that all developers across Hapag-Lloyd have a common base on which to...
-
DevOps Engineer
4 weeks ago
Chennai, Tamil Nadu, India Hapag-Lloyd AG Full timeAbout Hapag-LloydHapag-Lloyd is a leading liner shipping company with a fleet of 287 modern container ships and a total transport capacity of 11.9 million TEU. The Company has around 13,500 employees and 400 offices in 139 countries. Hapag-Lloyd has a container capacity of 11.9 million TEU – including one of the largest and most modern fleets of reefer...
-
Site Reliability Engineer
1 month ago
Chennai, Tamil Nadu, India NexionPro Services Full timeJob Title : Site Reliability Engineer (SRE)Location : Chennai (Guindy)Experience : 5-8 yearsNotice Period : Immediate or serving notice (August joiners preferred)Work Mode : 5 days in-officeReferences are highly appreciated.Job Summary : We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a solid...
-
Gleecus TechLabs
1 month ago
Chennai, Tamil Nadu, India Gleecus Full timeJob Title : DevSecOps Engineer. Location : Chennai. Responsibilities :- Collaborate with development, operations, and security teams to integrate security best practices and tools into the software development life cycle.- Shift Left Approach to security within the pipelines, to scan and update IaC code for relevant Benchmark/frameworks and industry best...
-
Centific - Cloud Application Architect
4 months ago
Hyderabad/Chennai, Tamil Nadu, India Centific Global Technologies Full timeAs a Centific Cloud architect, you are responsible for designing, building, and configuring applications to meet business process & application requirements in cloud technologies and Lead the Agile Team.- Job Description : - Hand-on technical expertise with architecting, implementing, and deploying enterprise applications on cloud platforms (AWS, Azure, GCP,...
-
Cloud Reliability Engineer
4 weeks ago
Chennai, Tamil Nadu, India FIS Full timeAbout the RoleWe are seeking a highly skilled Cloud Reliability Engineer to join our team at FIS. As a Cloud Reliability Engineer, you will be responsible for designing, implementing, and documenting EKS, MSK, and CI/CD infrastructure.Key ResponsibilitiesDesign and implement EKS, MSK, and CI/CD infrastructureAdvocate for and apply best design practices for...