Centific - Director - Site Reliability Engineering

4 weeks ago


Chennai, Tamil Nadu, India Centific Global Technologies Full time

Key Responsibilities :

Strategic Leadership & Vision :

- Lead and manage the Software Release Management function for all Data and AI products.

- Establish a centralized release management framework for AI and data products that scales with the growing product portfolio.

- Form and lead a high-performing Site Reliability Engineering (SRE) team to ensure the operational stability and performance of all AI and data-driven applications post-release.

- Collaborate with Product, Engineering and Operations teams to align release and SRE strategies with business objectives.

Release Planning & Coordination :

- Oversee the full lifecycle of software and AI model releases, from planning and coordination to post-release evaluation.

- Develop and maintain a detailed release calendar that aligns with the timelines and priorities of various product teams.

- Coordinate release activities with multiple cross-functional teams, ensuring transparent communication of dependencies, risks, and milestones.

- Ensure that all releases are integrated seamlessly into production, minimizing downtime and disruptions to end users.

Site Reliability Engineering (SRE) Team Formation :

- Hire, build, and lead the SRE team responsible for maintaining the reliability, scalability, and performance of all Data and AI products in production.

- Define the roles and responsibilities of the SRE team, ensuring clear alignment with the goals of product engineering and release management.

- Develop and implement SRE best practices, including incident response, root cause analysis, and proactive performance monitoring.

- Establish SLAs, SLOs, and SLIs (Service Level Agreements/Objectives/Indicators) to track and measure the reliability and performance of all services post-release.

- Collaborate with DevOps to ensure that automated CI/CD pipelines integrate seamlessly with SRE processes and monitoring systems.

Process Optimization & Automation :

- Lead the automation of software release processes, with an emphasis on CI/CD pipelines for AI models, data pipelines, and cloud-based AI products.

- Develop infrastructure-as-code practices to improve the scalability and reliability of AI and data systems across production environments.

- Introduce tools for version control, model governance, and monitoring for MLOps and AI model management in production.

- Continuously improve operational procedures to reduce the number of incidents and optimize recovery time.

Risk & Quality Management :

- Implement comprehensive quality assurance and validation processes to ensure that all AI models, data products, and software releases meet security, performance, and compliance requirements.

- Proactively identify and mitigate risks related to releases, AI model performance, and operational stability in production.

- Conduct post-release reviews and retrospectives to continuously improve both the release process and the reliability of products.

Collaboration & Stakeholder Management :

- Serve as the central point of contact for release management and SRE-related matters, ensuring consistent communication between engineering, product teams, and key stakeholders.

- Facilitate cross-functional collaboration to ensure that releases and operational reliability goals are met efficiently and effectively.

- Provide regular updates on release progress, system reliability, and any potential risks to executives and product leadership.

Innovation & Continuous Improvement :

- Stay up to date with the latest trends in SRE, DevOps, AI/ML, and cloud operations, incorporating new tools and practices to improve the overall reliability and release processes.

- Drive the adoption of cutting-edge tools in MLOps, AI model deployment, and automated incident resolution to continuously optimize operations and model lifecycle management.

- Foster a culture of continuous improvement by encouraging feedback loops and metrics-driven decision-making across both the release management and SRE teams.

Qualifications :

- Bachelor's or Master's degree in Computer Science, Data Engineering, AI/ML, or a related field.

- 10+ years of experience in software release management, with at least 3-5 years in SRE or DevOps environments, preferably in AI or data-driven applications.

- Proven experience building and managing both release management and SRE teams in complex, multi-product environments.

- Strong knowledge of AI/ML operations (MLOps), data pipeline management, and cloud-based AI product deployments.

- Expertise in release management tools (Jenkins, GitLab, Git, Jira) and SRE tools such as Prometheus, Grafana, Datadog, or similar monitoring systems.

- Experience with cloud platforms (AWS, GCP, Azure), containerization (Kubernetes, Docker), and infrastructure automation tools (Terraform, Ansible).

- Excellent problem-solving, organizational, and leadership skills, with a strong track record of driving continuous improvement in both release and operational reliability processes.

Preferred Qualifications :

- Experience deploying and maintaining large-scale AI/ML models in production environments, including monitoring, retraining, and operationalization.

- Familiarity with ITIL, MLOps, or DevOps frameworks and best practices.

- Knowledge of cloud-based services and tools specifically designed for AI/ML (e.g., AWS SageMaker, TensorFlow, PyTorch).

- Demonstrated ability to manage incident response and root cause analysis in complex software ecosystems.

(ref:hirist.tech)

  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. IMMEDIATE JOINER PREFERRED : We're looking for a skilled Senior Azure DevOps...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Job Description : As a Centific Cloud architect, you are responsible for...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with CentificCentific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Job Description : As a Centific Cloud architect, you are responsible for...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Role : GenAI Engineer The next frontier of AI begins with Centific :Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Key Responsibilities :- Ideate and formulate a pragmatic...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Role : GenAI Engineer The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Key Responsibilities :- Ideate and formulate a pragmatic...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Role : GenAI Engineer The next frontier of AI begins with Centific :Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Key Responsibilities :- Ideate and formulate a pragmatic...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Role : Ai Deployment ArchitectJob Description : Key Responsibilities : - Design...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data. Overview : The Senior Delivery Manager for AI projects is a key role that...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Overview : The Senior Delivery Manager for AI projects is a key role that...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    The next frontier of AI begins with Centific : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.Overview : The Senior Delivery Manager for AI projects is a key role that...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    Job Description : Centific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.The ideals candidate should have strong expertise in backend development using Python and Django framework,...

  • Architect

    4 weeks ago


    Chennai, Tamil Nadu, India Centific Full time

    The next frontier of AI begins with CentificCentific is a Seattle-based tech company pioneering the future of AI one breakthrough at a time. Learn how we're transforming the world through safe and scalable AI and empowering businesses to unlock the full potential of their data.AI DEPLOYMENT ARCHITECT to work with AI COE Global TeamJob DescriptionKey...


  • Chennai, Tamil Nadu, India Bright Vision Technologies Full time

    Bright Vision Technologies has an immediate Full-time opportunity for Site Reliability Engineer (SRE)  Job Role:  Site Reliability Engineer (SRE) Job Type: Full Time Candidates Looking for Visa sponsorship and willing to relocate to USA are encouraged to apply.About Bright Vision Technologies: Bright Vision Technologies is a fast-growing technology company...


  • Chennai, Tamil Nadu, India 10decoders Full time

    JD: Site Reliability Engineer - GCP With Terraform The Role: We are looking for a Senior SRE with 5+ years of experience to work primarily with our Application development team. An ideal candidate would have extensive experience building cloud infrastructure on Google Cloud with Terraform and have strong experience running workloads that scale on Google's...


  • Chennai, Tamil Nadu, India Burgeon It Services Pvt Ltd Full time

    Job Title : SRE EngineerLocation : ChennaiExperience : 8+ YearsJob Description :We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in software engineering and operations, with a passion for building scalable and reliable systems.Key Responsibilities :- Design, implement,...


  • Chennai, Tamil Nadu, India Burgeon It Services Pvt Ltd Full time

    Job Title : SRE EngineerLocation : ChennaiExperience : 8+ YearsJob Description :We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in software engineering and operations, with a passion for building scalable and reliable systems.Key Responsibilities :- Design, implement,...


  • Chennai, Tamil Nadu, India 10decoders Full time

    JD: Site Reliability Engineer - GCP With TerraformThe Role:We are looking for a Senior SRE with 5+ years of experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure on Google Cloud with Terraform and have strongexperience running workloads that scale on Google's...


  • Chennai, Tamil Nadu, India Centific Global Technologies Full time

    At Centific Global Technologies, we are pioneering the future of AI and seeking a highly skilled Senior Azure DevOps Engineer to join our team. The successful candidate will be responsible for designing, implementing, and maintaining scalable and efficient CI/CD pipelines integrated with Azure DevOps.The ideal candidate will have extensive experience in...


  • Chennai, Tamil Nadu, India 10decoders Full time

    Job Summary We are seeking a Senior Site Reliability Engineer (SRE) with 5+ years of experience to join our team and work primarily with our Application development team. The ideal candidate will have extensive experience building cloud infrastructure on Google Cloud Platform using Terraform and strong experience running workloads that scale on Google's...


  • Chennai, Tamil Nadu, India 10decoders Full time

    JD: Site Reliability Engineer -GCP With TerraformThe Role:We are looking for a Senior SRE with5+ yearsof experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure onGoogle Cloud with Terraformand have strongexperience running workloads that scale on Google's Kubernetes...