Centific | Head Site Reliability Engineering | chennai

1 week ago


chennai, India Centific Full time

Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.


Head / Director SRE

Key Responsibilities:

Strategic Leadership & Vision:

· Lead and manage the Software Release Management function for all Data and AI products.

· Establish a centralized release management framework for AI and data products that scales with the growing product portfolio.

· Form and lead a high-performing Site Reliability Engineering (SRE) team to ensure the operational stability and performance of all AI and data-driven applications post-release.

· Collaborate with Product, Engineering and Operations teams to align release and SRE strategies with business objectives.


Release Planning & Coordination:

· Oversee the full lifecycle of software and AI model releases, from planning and coordination to post-release evaluation.

· Develop and maintain a detailed release calendar that aligns with the timelines and priorities of various product teams.

· Coordinate release activities with multiple cross-functional teams, ensuring transparent communication of dependencies, risks, and milestones.

· Ensure that all releases are integrated seamlessly into production, minimizing downtime and disruptions to end users.


Site Reliability Engineering (SRE) Team Formation:

· Hire, build, and lead the SRE team responsible for maintaining the reliability, scalability, and performance of all Data and AI products in production.

· Define the roles and responsibilities of the SRE team, ensuring clear alignment with the goals of product engineering and release management.

· Develop and implement SRE best practices, including incident response, root cause analysis, and proactive performance monitoring.

· Establish SLAs, SLOs, and SLIs (Service Level Agreements/Objectives/Indicators) to track and measure the reliability and performance of all services post-release.

· Collaborate with DevOps to ensure that automated CI/CD pipelines integrate seamlessly with SRE processes and monitoring systems.


Process Optimization & Automation:

· Lead the automation of software release processes, with an emphasis on CI/CD pipelines for AI models, data pipelines, and cloud-based AI products.

· Develop infrastructure-as-code practices to improve the scalability and reliability of AI and data systems across production environments.

· Introduce tools for version control, model governance, and monitoring for MLOps and AI model management in production.

· Continuously improve operational procedures to reduce the number of incidents and optimize recovery time.


Risk & Quality Management:

· Implement comprehensive quality assurance and validation processes to ensure that all AI models, data products, and software releases meet security, performance, and compliance requirements.

· Proactively identify and mitigate risks related to releases, AI model performance, and operational stability in production.

· Conduct post-release reviews and retrospectives to continuously improve both the release process and the reliability of products.


Collaboration & Stakeholder Management:

· Serve as the central point of contact for release management and SRE-related matters, ensuring consistent communication between engineering, product teams, and key stakeholders.

· Facilitate cross-functional collaboration to ensure that releases and operational reliability goals are met efficiently and effectively.

· Provide regular updates on release progress, system reliability, and any potential risks to executives and product leadership.


Innovation & Continuous Improvement:

· Stay up to date with the latest trends in SRE, DevOps, AI/ML, and cloud operations, incorporating new tools and practices to improve the overall reliability and release processes.

· Drive the adoption of cutting-edge tools in MLOps, AI model deployment, and automated incident resolution to continuously optimize operations and model lifecycle management.

· Foster a culture of continuous improvement by encouraging feedback loops and metrics-driven decision-making across both the release management and SRE teams.

---

Qualifications:

· Bachelor’s or Master’s degree in Computer Science, Data Engineering, AI/ML, or a related field.

· 10+ years of experience in software release management, with at least 3-5 years in SRE or DevOps environments, preferably in AI or data-driven applications.

· Proven experience building and managing both release management and SRE teams in complex, multi-product environments.

· Strong knowledge of AI/ML operations (MLOps), data pipeline management, and cloud-based AI product deployments.

· Expertise in release management tools (Jenkins, GitLab, Git, Jira) and SRE tools such as Prometheus, Grafana, Datadog, or similar monitoring systems.

· Experience with cloud platforms (AWS, GCP, Azure), containerization (Kubernetes, Docker), and infrastructure automation tools (Terraform, Ansible).

· Excellent problem-solving, organizational, and leadership skills, with a strong track record of driving continuous improvement in both release and operational reliability processes.

Preferred Qualifications:

· Experience deploying and maintaining large-scale AI/ML models in production environments, including monitoring, retraining, and operationalization.

· Familiarity with ITIL, MLOps, or DevOps frameworks and best practices.

· Knowledge of cloud-based services and tools specifically designed for AI/ML (e.g., AWS SageMaker, TensorFlow, PyTorch).

· Demonstrated ability to manage incident response and root cause analysis in complex software ecosystems.



  • chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Head / Director SRE Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software Release...


  • chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. SRE CORE Manager/Lead/Director Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software...


  • chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.SRE CORE Manager/Lead/DirectorKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release...


  • chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. SRE CORE Manager/Lead/Director Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Head / Director SRE Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software Release...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Head / Director SREKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release Management...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Head / Director SREKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release Management...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Head / Director SRE Key Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release Management...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Head / Director SRE Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software Release...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Head / Director SRE Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software Release...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Head / Director SRE Key Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release Management...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Job Title:Site Reliability Engineer - AI/ML OperationsAbout the Role:Centific Global Technologies is seeking an experienced Site Reliability Engineer to join our team and lead the development of our AI/ML operations infrastructure. This individual will be responsible for designing, building, and maintaining scalable and reliable systems for our data and AI...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Job Title: Site Reliability Engineer - AI/ML OperationsJob Summary:Centific Global Technologies is seeking a highly skilled Site Reliability Engineer to lead the AI/ML operations team. The ideal candidate will have a strong background in software release management, SRE, and DevOps, with experience in AI/ML operations, data pipeline management, and...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. SRE CORE Manager/Lead/Director Key Responsibilities: Strategic Leadership & Vision: · Lead and manage the Software...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.SRE CORE Manager/Lead/DirectorKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Job Description :Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Key Responsibilities : Strategic Leadership & Vision :- Lead and manage the Software Release Management...


  • Chennai, India Centific Global Technologies Full time

    Job Description :Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Key Responsibilities : Strategic Leadership & Vision :- Lead and manage the Software Release Management...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.SRE CORE Manager/Lead/DirectorKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release...


  • chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their dataRequired Technical and Professional ExpertiseOverall 15+ years of experience and a minimum of 5 years of experience as a...


  • Chennai, India Centific Full time

    Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we’re transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.SRE CORE Manager/Lead/DirectorKey Responsibilities:Strategic Leadership & Vision:· Lead and manage the Software Release...