
AI Systems Reliability Engineer
2 weeks ago
Job Overview
We are seeking a highly skilled and proactive AI Solutions SRE Lead to oversee the maintenance, optimization, and ongoing performance of deployed AI/ML systems and solutions. The ideal candidate will possess a strong background in machine learning, data science, or software engineering roles, with significant exposure to Computer Vision and Generative AI projects.
Main Responsibilities:
- Lead the post-deployment lifecycle of AI solutions, ensuring continued functionality, reliability, and scalability. This involves establishing monitoring frameworks to oversee system performance, usage, and metrics for AI/ML models and APIs.
- Detect anomalies in AI systems, troubleshoot operational issues, and initiate timely corrective actions to maintain system health and prevent downtime.
Performance Optimization:
- Continuously assess and optimize the performance of AI models to maintain efficiency and accuracy in production environments. This requires collaborating with data scientists and engineers to refine algorithms, retrain models, and update solutions as needed.
- Implement automation where possible to streamline maintenance processes and improve overall system reliability.
Stakeholder Collaboration:
- Work with cross-functional teams (engineering, product, operations, etc.) to ensure alignment of AI sustainment activities with business goals. Effective communication is crucial in this role, as you will need to provide updates on system health, risks, and improvements to stakeholders.
- Communicate effectively with stakeholders to provide updates on system health, risks, and improvements. This includes providing regular status reports and making recommendations for process improvements.
Governance & Best Practices:
- Define and implement best practices for sustaining AI solutions, including documentation, testing protocols, and version control. This ensures that AI systems are maintained in accordance with industry standards and regulatory requirements.
- Ensure compliance with ethical AI standards, regulatory guidelines, and established governance frameworks. This includes managing and mitigating risks associated with model drift, data shifts, and system vulnerabilities.
Incident Management:
- Lead responses to critical incidents involving AI systems by performing root cause analysis and deploying solutions for quick resolution. Your ability to analyze complex technical issues and develop effective solutions will be essential in this role.
- Mentor and develop junior team members, fostering their skills in AI observability and domain-specific knowledge in ML, Computer Vision, and Generative AI.
Key Qualifications:
- Bachelor's degree in Computer Science, Engineering, Data Science, or related field; advanced degree preferred.
- 9+ years of experience in machine learning, data science, or software engineering roles, with significant exposure to Computer Vision and Generative AI projects.
- 4+ years of experience specifically focused on AI/ML development and sustenance of applications/solutions.
- Strong programming skills in languages such as Python, Java, or Go.
- Extensive experience with AI/ML frameworks (e.g., TensorFlow, PyTorch, scikit-learn) and cloud platforms (e.g., AWS, Azure, GCP).
- Proficiency in data visualization tools and techniques (e.g., Grafana, Tableau, D3.js).
- Deep understanding of AI/ML concepts, including model training, evaluation, and deployment, with specific knowledge of Computer Vision and Generative AI techniques.
- Experience with monitoring and observability tools such as Prometheus, ELK stack, or similar systems.
- Excellent problem-solving skills and ability to troubleshoot complex AI systems across various domains.
- Proven track record of mentoring and developing junior team members in AI-related roles.
PREFERRED SKILLS:
- Experience with MLOps practices and tools, particularly for large-scale AI systems.
- Familiarity with AI ethics and responsible AI principles, especially as they relate to Generative AI.
- Knowledge of relevant AI regulations and compliance requirements, including those specific to Computer Vision applications.
- Experience with distributed systems and large-scale data processing for AI applications.
- Contributions to open-source projects or research publications in AI solution at production scale. Previous experience with large-scale AI/ML solutions in production environments.
- Knowledge of DevOps principles and CI/CD pipelines specific to AI/ML systems.
Key Competencies:
- Strong analytical and critical thinking skills
- Excellent communication and collaboration abilities
- Proactive and self-motivated work ethic
- Ability to explain complex technical concepts to both technical and non-technical audiences
- Adaptability and willingness to learn in a rapidly evolving field
- Strong mentorship and leadership skills
- Deep curiosity and passion for AI, particularly in ML, Computer Vision, and Generative AI domains
- We are looking for a passionate and innovative individual who can help us build robust, transparent, and reliable AI systems while nurturing the growth of our team. If you have a strong background in AI/ML, with specific expertise in Computer Vision and Generative AI, and a keen interest in observability and system reliability, we encourage you to apply.
-
Reliable Systems Engineer
2 weeks ago
Cochin, Kerala, India beBeeEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Site Reliability Engineer (SRE) RoleThis position is a critical component of our technology team, focusing on the reliability and efficiency of our systems.We are seeking an experienced SRE to define and implement strategies for automation, process improvement, and reliability enhancement across our services.The ideal candidate will have a strong...
-
AI Systems Developer
1 week ago
Cochin, Kerala, India beBeeEngineer Full time ₹ 1,00,00,000 - ₹ 2,50,00,000Unlock AI Engineering OpportunitiesWe are seeking a seasoned AI Engineer to spearhead the development of innovative voice AI systems and intelligent agents. Our ideal candidate will have a strong background in software engineering, particularly with Python or similar languages.The selected professional will design, develop, and deploy advanced AI models and...
-
Advanced AI Systems Engineer
2 weeks ago
Cochin, Kerala, India beBeeArtificial Full time US$ 1,80,000 - US$ 2,30,000Advanced AI Systems EngineerWe are seeking a highly skilled and forward-thinking engineer to design, deploy, and operationalize modern AI solutions using cutting-edge technologies.About the RoleDesign and develop advanced AI systems leveraging Large Language Models (LLMs), multi-modal models, and autonomous agent frameworks.Architect and implement...
-
Reliability Systems Engineer
2 weeks ago
Cochin, Kerala, India beBeeInfrastructure Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job DescriptionWe take a software engineering approach to designing and implementing infrastructure and operations. Our mission is to develop platforms that enable safe, reliable, and scalable provisioning and management of services.We strive to continuously challenge the status quo and leverage new technologies to build platforms and tooling for our...
-
Enterprise AI Engineer
2 weeks ago
Cochin, Kerala, India beBeeArtificial Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Incidents are a ticking time bomb for modern enterprises. Every minute of downtime results in lost revenue, eroded trust, and overworked engineers.BugRaid.AI is developing the world's first enterprise-grade incident copilot - intelligent, autonomous systems that can detect, diagnose, and resolve complex production incidents across vast streams of logs,...
-
AI Systems Engineer
2 weeks ago
Cochin, Kerala, India beBeeLeader Full time US$ 1,00,000 - US$ 1,60,000Job OverviewWe partner with ambitious organisations to design, build and scale AI systems that deliver measurable outcomes. Our approach involves embedding with client teams to engineer for impact.The ideal candidate will lead the development of Azure and Power Platform solutions for an enterprise Market Management pilot: territories.Azure architecture and...
-
AI Engineer
5 days ago
Cochin, Kerala, India Art Technology and Software Full time ₹ 5,00,000 - ₹ 15,00,000 per yearJob Title:AI Engineer – Internal Tooling & AutomationLocation:Kerala (Preference will be given to candidates based in Kerala)Role SummaryWe are looking for a forward-thinking and passionateAI Engineerto join our Internal Tooling and Automation team. This role is a unique blend of software engineering, automation, and applied artificial intelligence. As an...
-
System Reliability Expert
2 weeks ago
Cochin, Kerala, India beBeeExpertise Full time ₹ 2,00,00,000 - ₹ 2,25,00,000Job Title: Systems Reliability ExpertWe are reimagining the future of investing through cutting-edge technology solutions.The Technology Infrastructure team engineers and operates the foundational technology platforms that power our applications and businesses.This is a fast-paced, dynamic, and collaborative environment focused on innovation and challenging...
-
Senior Site Reliability Engineer
2 weeks ago
Cochin, Kerala, India beBeeSiteReliability Full time US$ 1,50,000 - US$ 2,00,000Job Opportunity: We are seeking a seasoned Site Reliability Engineer to lead the operational health of our Accounting and Finance platforms.This role requires a strong background in automation, monitoring, and incident response. The ideal candidate will have experience working with financial applications, data pipelines, and cloud infrastructure.About the...
-
Reliable System Innovator
2 weeks ago
Cochin, Kerala, India beBeeAutomation Full time ₹ 15,00,000 - ₹ 25,00,000Job Description:As a Site Reliability Engineer, you will play a critical role in maintaining the stability and performance of our digital infrastructure. Your primary responsibility will be to identify potential system issues early, implement preventive measures, and boost system resilience.You will also be responsible for automating processes by building...