Site Reliability Engineer

1 week ago


Bangalore, Karnataka, India Infiniti Research Full time

About Quantzig :

Quantzig is a global analytics and advisory firm with offices in the US, UK, Canada, China, and India. We have assisted our clients across the globe with end-to-end advanced analytics, visual storyboarding , Machine Learning and data engineering solutions implementation for prudent decision making. We are a rapidly growing organization that is built and operated by high-performance champions.

If you have what it takes to be the champion with business and functional skills to take ownership of an entire project end-to-end, help build a team with great work ethic and a drive to learn, you are the one we're looking for.

The clients love us for our solutioning capability, our enthusiasm and we expect you to be a part of our growth story.

Designation: SRE Machine Learning and AI Platform.

Job Summary :

As a Site Reliability Engineer (SRE) specializing in Machine Learning and AI Platform, you will play a critical role in designing, implementing, and maintaining a highly scalable, reliable, and performant infrastructure to support our organization's machine learning and artificial intelligence initiatives.

You'll collaborate closely with cross-functional teams including data scientists, software engineers, and product managers to ensure our ML/AI platform meets the highest standards of reliability, availability, and efficiency.

Key Responsibilities :

Main Role as SRE :

- Design and implement robust, scalable, and automated infrastructure solutions to support our machine learning and artificial intelligence workloads.

- Proactively identify and address potential performance bottlenecks, reliability issues, and security vulnerabilities in the ML/AI platform.

- Collaborate with AI engineering teams to define best practices for deploying, monitoring, and managing machine learning models and pipelines in production environments.

- Continuously optimize infrastructure components for cost-effectiveness, scalability, and performance.

- Optimize platform performance and ensure security and compliance standards are met.

- Collaborate with cross-functional teams to troubleshoot and resolve platform-related issues.

- Provide technical guidance and mentorship to junior team members.

- Create, govern and continuously improve IaC automation framework and scripts for our company wide solutions.

- Provide direct technical design and delivery support for top priority initiatives while governing, influencing and approving all other initiatives.

- Provide support and guidance across the company on technical design and standards.

- Ensure that delivered solutions are aligned to enterprise standards (Architecture, Operations, and Infrastructure) and of high quality while maintaining the required non-functional attributes such as performance, supportability, security, usability, reliability and stability.

Deliverables :

- Help to architect and deploy highly available and fault-tolerant infrastructure for hosting machine learning models, training pipelines,.

- Implement automated deployment pipelines for deploying ML/AI models and pipelines into production environments.

- Develop and maintain monitoring and alerting systems to ensure the health and performance of the ML/AI platform.

- Create documentation and provide training to internal teams on best practices for operating and troubleshooting the ML/AI platform.

- Contribute to the development of internal tools and frameworks to streamline machine learning workflow processes.

Qualifications :

- Level of educational attainment required : 5-10 year of experience.

- Academic Degree - BE or BTech, MCA, M.Sc. Engineer, IT-Related professions.

- Extensive experience in designing, implementing, and managing cloud-based infrastructure solutions, preferably on platforms such as Azure, GCP is a plus.

- Proficiency in containerization technologies such as Docker and orchestration frameworks like Kubernetes.

- Strong programming skills in languages such as Python, Terraform.

- Experience with monitoring and observability tools such as Prometheus, Grafana, and ELK stack.

- Excellent problem-solving and communication skills, with a proactive and collaborative approach to working in cross-functional teams.

(ref:hirist.tech)

  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff to...


  • Bangalore, Karnataka, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bangalore, Karnataka, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bangalore, Karnataka, India TERRAGIG LLP Full time

    Role : Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.-...


  • Bangalore, Karnataka, India ALIQAN Technologies Full time

    Job Description :We are seeking a Site Reliability Engineer with strong platform development skills and a thorough understanding of securing environments, with a solid grasp of information security and performance optimization. This role focuses on building scalable, secure, and exceptional infrastructure, automating processes wherever possible. Ideal...

  • Engineering Director

    3 weeks ago


    Bangalore, Karnataka, India CareerNet Technologies Full time

    Job Description :Site Reliability Engineers (SREs) at Coupang is a mission-critical role that combines software and system engineering to build, run, and scale our complex, large-scale ecommerce systems. As part of the Site Reliability Engineering team, you will be responsible for ensuring all our customer-facing services are healthy, monitored, automated,...

  • Engineering Director

    2 months ago


    Bangalore, Karnataka, India CareerNet Technologies Full time

    Job Description :Site Reliability Engineers (SREs) at Coupang is a mission-critical role that combines software and system engineering to build, run, and scale our complex, large-scale ecommerce systems. As part of the Site Reliability Engineering team, you will be responsible for ensuring all our customer-facing services are healthy, monitored, automated,...


  • Bangalore, Karnataka, India Protoporos Staffing Services Pvt Ltd Full time

    Opportunity with a leading B2B SaaS product client specializing in cutting-edge data integration solutions. Position Overview: We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our team. As a Staff SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our data integration...


  • Bangalore, Karnataka, India Protoporos Staffing Services Pvt Ltd Full time

    Opportunity with a leading B2B SaaS product client specializing in cutting-edge data integration solutions. Position Overview: We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our team. As a Staff SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our data integration...


  • Bangalore, Karnataka, India SWAI TECHNOLOGIES PRIVATE LIMITED Full time

    Role : Senior Site reliability Engineer Exp : 5 to 10 Years of experience Remote Opportunity Company Description :Tech recruitment is broken Companies say there is a shortage of talent and it's hard to find good developers, while developers find it hard to find companies that value the skill, experience and passion they bring to the table.Quite the...


  • Bangalore, Karnataka, India SWAI TECHNOLOGIES PRIVATE LIMITED Full time

    Role : Senior Site reliability Engineer Exp : 5 to 10 Years of experience Remote Opportunity Company Description :Tech recruitment is broken Companies say there is a shortage of talent and it's hard to find good developers, while developers find it hard to find companies that value the skill, experience and passion they bring to the table.Quite the...


  • Bangalore, Karnataka, India Prudential Manpower Pvt.lTD Full time

    Position : Site Reliability EngineerLocation : BangaloreNotice Period : Immediate to 30 Days Minimum Requirements : - 4 years of experience as a Site Reliability Engineer.- Experience with one or more of the following : C++, Java, Python, Go, Perl and/or Ruby etc.- Experience with Unix/Linux operating systems internals and administration or networking.-...


  • Bangalore, Karnataka, India Prudential Manpower Pvt.lTD Full time

    Position : Site Reliability EngineerLocation : BangaloreNotice Period : Immediate to 30 Days Minimum Requirements : - 4 years of experience as a Site Reliability Engineer.- Experience with one or more of the following : C++, Java, Python, Go, Perl and/or Ruby etc.- Experience with Unix/Linux operating systems internals and administration or networking.-...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    About the job :We are hiring for Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability...


  • Bangalore, Karnataka, India Cyitechsearch Full time

    About the job :We are hiring for Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability...


  • Bangalore, Karnataka, India One Degree North HR Services Full time

    Responsibilities:- Establish instrumentation to measure SLI (Service Level Indicators), define SLO (Service Level Objectives), Alerting mechanisms, review with Stakeholders- Ensure the reliability, scalability and performance of our cloud-based systems and On-Prem Systems.- Support the automation tools and frameworks (CI/CD pipelines).- Provide inputs to...


  • Bangalore, Karnataka, India One Degree North HR Services Full time

    Responsibilities:- Establish instrumentation to measure SLI (Service Level Indicators), define SLO (Service Level Objectives), Alerting mechanisms, review with Stakeholders- Ensure the reliability, scalability and performance of our cloud-based systems and On-Prem Systems.- Support the automation tools and frameworks (CI/CD pipelines).- Provide inputs to...


  • Bangalore, Karnataka, India Prudential Manpower Pvt.lTD Full time

    Notice Period : Immediate to 30 Days Minimum Requirements : - 4 years of experience as a Site Reliability Engineer.Experience with one or more of the following : - C++, Java, Python, Go, Perl and/or Ruby etc.- Experience with Unix/Linux operating systems internals and administration or networking.- Experience with Site Reliability Engineering, System Design,...


  • Bangalore, Karnataka, India Prudential Manpower Pvt.lTD Full time

    Notice Period : Immediate to 30 Days Minimum Requirements : - 4 years of experience as a Site Reliability Engineer.Experience with one or more of the following : - C++, Java, Python, Go, Perl and/or Ruby etc.- Experience with Unix/Linux operating systems internals and administration or networking.- Experience with Site Reliability Engineering, System Design,...