Site Reliability Engineer

2 weeks ago


bangalore, India Infiniti Research Full time

About Quantzig :

Quantzig is a global analytics and advisory firm with offices in the US, UK, Canada, China, and India. We have assisted our clients across the globe with end-to-end advanced analytics, visual storyboarding , Machine Learning and data engineering solutions implementation for prudent decision making. We are a rapidly growing organization that is built and operated by high-performance champions.

If you have what it takes to be the champion with business and functional skills to take ownership of an entire project end-to-end, help build a team with great work ethic and a drive to learn, you are the one we're looking for.

The clients love us for our solutioning capability, our enthusiasm and we expect you to be a part of our growth story.

Designation: SRE Machine Learning and AI Platform.

Job Summary :

As a Site Reliability Engineer (SRE) specializing in Machine Learning and AI Platform, you will play a critical role in designing, implementing, and maintaining a highly scalable, reliable, and performant infrastructure to support our organization's machine learning and artificial intelligence initiatives.

You'll collaborate closely with cross-functional teams including data scientists, software engineers, and product managers to ensure our ML/AI platform meets the highest standards of reliability, availability, and efficiency.

Key Responsibilities :

Main Role as SRE :

- Design and implement robust, scalable, and automated infrastructure solutions to support our machine learning and artificial intelligence workloads.

- Proactively identify and address potential performance bottlenecks, reliability issues, and security vulnerabilities in the ML/AI platform.

- Collaborate with AI engineering teams to define best practices for deploying, monitoring, and managing machine learning models and pipelines in production environments.

- Continuously optimize infrastructure components for cost-effectiveness, scalability, and performance.

- Optimize platform performance and ensure security and compliance standards are met.

- Collaborate with cross-functional teams to troubleshoot and resolve platform-related issues.

- Provide technical guidance and mentorship to junior team members.

- Create, govern and continuously improve IaC automation framework and scripts for our company wide solutions.

- Provide direct technical design and delivery support for top priority initiatives while governing, influencing and approving all other initiatives.

- Provide support and guidance across the company on technical design and standards.

- Ensure that delivered solutions are aligned to enterprise standards (Architecture, Operations, and Infrastructure) and of high quality while maintaining the required non-functional attributes such as performance, supportability, security, usability, reliability and stability.

Deliverables :

- Help to architect and deploy highly available and fault-tolerant infrastructure for hosting machine learning models, training pipelines,.

- Implement automated deployment pipelines for deploying ML/AI models and pipelines into production environments.

- Develop and maintain monitoring and alerting systems to ensure the health and performance of the ML/AI platform.

- Create documentation and provide training to internal teams on best practices for operating and troubleshooting the ML/AI platform.

- Contribute to the development of internal tools and frameworks to streamline machine learning workflow processes.

Qualifications :

- Level of educational attainment required : 5-10 year of experience.

- Academic Degree - BE or BTech, MCA, M.Sc. Engineer, IT-Related professions.

- Extensive experience in designing, implementing, and managing cloud-based infrastructure solutions, preferably on platforms such as Azure, GCP is a plus.

- Proficiency in containerization technologies such as Docker and orchestration frameworks like Kubernetes.

- Strong programming skills in languages such as Python, Terraform.

- Experience with monitoring and observability tools such as Prometheus, Grafana, and ELK stack.

- Excellent problem-solving and communication skills, with a proactive and collaborative approach to working in cross-functional teams.

(ref:hirist.tech)

  • bangalore, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff...


  • bangalore, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 3 - 5 yearsResponsibilities:● Design,...


  • Bangalore, India Cyitechsearch Full time

    About the job :We are hiring for Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills : Develop and provide operational support for fullstack software applications. Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation. Five years' experience as a site reliability engineer...


  • bangalore, India Kunato Full time

    Site Reliability Engineer (SRE) - Python/GolangJob Description:We are seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our technology team. The ideal candidate will possess strong programming skills with expertise in Python, Golang, or both. This role is pivotal in ensuring the high availability, performance, and security of...


  • bangalore, India Kunato Full time

    Site Reliability Engineer (SRE) - Python/GolangJob Description:We are seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our technology team. The ideal candidate will possess strong programming skills with expertise in Python, Golang, or both. This role is pivotal in ensuring the high availability, performance, and security of...


  • bangalore, India Kunato Full time

    Site Reliability Engineer (SRE) - Python/Golang Job Description: We are seeking a highly skilled and passionate Site Reliability Engineer (SRE) to join our technology team. The ideal candidate will possess strong programming skills with expertise in Python, Golang, or both. This role is pivotal in ensuring the high availability, performance, and security...


  • bangalore, India Vistex Full time

    Vistex is currently hiring a Site Reliability Engineer. The Vistex Site Reliability Engineer will be primarily responsible for service availability, performance, monitoring, incident response, and capacity planning. This is a highly technical, hands-on role with a strong focus on automation, accurate monitoring, actionable alerting, resilient design,...


  • bangalore, India Vistex Full time

    Vistex is currently hiring a Site Reliability Engineer. The Vistex Site Reliability Engineer will be primarily responsible for service availability, performance, monitoring, incident response, and capacity planning. This is a highly technical, hands-on role with a strong focus on automation, accurate monitoring, actionable alerting, resilient design,...


  • bangalore, India Integra Connect Full time

    About IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • bangalore, India Integra Connect Full time

    About IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Bangalore, Karnataka, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bangalore, Karnataka, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bangalore, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bangalore, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bengaluru/ Bangalore, India IBM India Pvt Ltd Full time

    Apply for Site Reliability Engineer, Career Progress Consultants in Bengaluru/ Bangalore for 1 - 4 Year of Experience on TimesJobs.com.


  • bangalore, India NetApp Full time

    Title: Site Reliability Engineer Location: Bangalore, Karnataka, IN, 560071 Requisition ID: 126661 Job Summary As a Keystone Site Reliability Engineer, you will be responsible for managing the various and monitor environments for Keystone. Your role will involve engaging various aspects in the lifecycle of Keystone services - from working on...


  • bangalore, India First American (India) Full time

    The Role:A SRE Manager is ultimately responsible for system reliability, developer productivity and reducing time to market by striving to reduce technical debt of the services your SRE team supports. We seek managers who are passionate about site reliability to influence and drive the strategic SRE mission.As a Site Reliability Engineering Manager working...