AI Evaluation Engineer
1 week ago
Role Summary
We are seeking an 
AI Evaluation Engineer
 to join our team and help define how next-generation AI systems are tested, trusted, and improved. In this role, you'll design and implement rigorous quality assurance and evaluation frameworks—combining automated pipelines, human-in-the-loop review, and synthetic data generation—to measure not only our platform reliability but also AI agents' accuracy, safety, and alignment with real-world use cases. You'll work end-to-end across the product lifecycle: writing test case scenarios, building automated tests, managing release test plans, developing dashboards and analysis tools, and translating insights into actionable improvements for both internal teams and clients.
Key Responsibilities
●       
Design evaluation frameworks
 for accuracy, safety, fairness, and alignment with intended use cases.
●       
Build and maintain evaluation pipelines
 that combine automated systems, human-in-the-loop review, and synthetic data generation to test AI Agents' performance at scale.
●       
Conduct failure mode and edge-case analysis
 to surface weaknesses, risks, and unexpected behaviors in AI outputs.
●       
Develop internal tools and dashboards
 that make evaluation results transparent, reproducible, and actionable across engineering, research, and client teams.
●       
Ensure evaluation datasets
 are diverse, representative, and high-quality, minimizing bias while capturing real-world complexity.
●       
Collaborate with researchers, engineers, and product stakeholders
 to translate insights into prioritized improvements and product decisions.
●       
Treat evaluation as a discipline of testing
—applying statistical rigor, reproducibility, and operational reliability across the AI lifecycle.
●       
Ensure deployment readiness
 by stress-testing agents for resilience, safety, and alignment in production-like environments.
●       
Quality Assurance
 ensure software is built to specifications. It is reliable, robust, secure, and ready for deployment. Create test cases, test plan, and bug reporting process for unit, regression, and UAT testing.
Qualifications & Skills
Required
● Strong software engineering skills, with proficiency in Python and familiarity with data pipelines, APIs, and evaluation tooling.
● Solid understanding of the machine learning lifecycle, including model training, testing, and deployment.
● Experience designing or implementing evaluation metrics, experiment design, or statistical analysis.
● Exposure to human-in-the-loop workflows, annotation systems, or synthetic data generation.
● Ability to conduct rigorous failure analysis and translate results into actionable insights.
● Clear, precise communication skills; able to present evaluation findings to technical and non-technical audiences.
Preferred
● Quality Assurance and Testing Experience
● Experience with LLMs, generative AI systems, or agentic workflows.
● Familiarity with fairness, bias detection, interpretability, or safety evaluation.
● Background in building dashboards, monitoring tools, or large-scale observability systems.
● Prior work with evaluation frameworks, testing suites, or reproducibility practices at scale.
● Comfort working end-to-end: from scoping evaluation goals to delivering deployment-ready results.
Seniority Levels / Variations
Depending on seniority (e.g., junior vs senior vs staff), responsibilities might scale to include:
● Owning or leading an evaluation strategy at a product or platform level.
● Mentoring others or managing QA teams
● Architecture of evaluation platforms.
● Setting standards for metrics, tools, and best practices across multiple product lines.
What We Offer / Why Join Us
● Opportunity to influence AI product quality, fairness, and trust at scale.
● Working with cutting-edge model architectures and AI tools.
● Collaborating with top researchers / engineers / product leaders.
● Flexibility / remote / collaborative environment (if applicable).
● Learning opportunities in safety, fairness, interpretability, and evaluation methodologies.
- 
					
						AI Engineers
2 days ago
Surat, Gujarat, India Ipangram Digital Services Llp Full time ₹ 20,00,000 - ₹ 25,00,000 per yearKey Responsibilities: Develop and deploy machine learning models and algorithms.Design and train AI/ML models using frameworks like TensorFlow, PyTorch, or scikit-learn.Implement generative AI models using GPT, VAE, and GANs.Collaborate with cross-functional teams to solve business problems and define AI project requirements.Stay updated with advancements...
 - 
					
						Diamond Evaluator
2 weeks ago
Surat, Gujarat, India Apt Resources Full time US$ 42,000 - US$ 84,000 per yearApt Resources is seeking an experienced Diamond Evaluator to join our client's mining operations team in Angola. The ideal candidate will bring deep expertise in evaluating, grading, and valuing rough diamonds, ensuring compliance with international standards and industry best practices. This role is key in maintaining the integrity and accuracy of the...
 - 
					
						Diamond Evaluator
2 weeks ago
Surat, Gujarat, India Apt Resources Full time ₹ 9,00,000 - ₹ 12,00,000 per yearApt Resources is seeking an experienced Diamond Evaluator to join our client's mining operations team in Angola. The ideal candidate will bring deep expertise in evaluating, grading, and valuing rough diamonds, ensuring compliance with international standards and industry best practices. This role is key in maintaining the integrity and accuracy of the...
 - 
					
						AI/ML Engineer
4 days ago
Surat, Gujarat, India Arham web work Full time ₹ 4,00,000 - ₹ 6,00,000 per yearJob Summary:We are seeking a skilled AI/ML Engineer with a foundational understanding of Laravel (PHP framework). The ideal candidate will have 2–3 years of hands-on experience in developing and deploying AI/ML models and the ability to integrate intelligent systems into Laravel-based web applications.Key Responsibilities:Design, develop, and deploy...
 - 
					
						AI/ML Engineer
4 days ago
Surat, Gujarat, India Arham Web Works Full time ₹ 15,00,000 - ₹ 25,00,000 per yearJob Summary:We are seeking a skilled AI/ML Engineer with a foundational understanding of Laravel (PHP framework). The ideal candidate will have 2–3 years of hands-on experience in developing and deploying AI/ML models and the ability to integrate intelligent systems into Laravel-based web applications.Key Responsibilities:Design, develop, and deploy ...
 - 
					
						AI Engineer
6 days ago
Surat, Gujarat, India Appstonelab Technologies Full time ₹ 5,00,000 - ₹ 15,00,000 per yearAI Engineers at AppStoneLab focus on building intelligent agentic workflows using large language models (LLMs) and tools like LangGraph, LangChain, and OpenAI APIs. You'll work closely with product and engineering teams to design prompt-based systems, automate tasks using LLMs, and create scalable AI-driven solutions. The ideal candidate is curious,...
 - 
					
Freelance AI Red Team Engineer
2 weeks ago
Surat, Gujarat, India Mindrift Full time ₹ 6,00,000 - ₹ 9,00,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
 - 
					
Freelance AI Red Team Engineer
2 weeks ago
Surat, Gujarat, India Mindrift Full time ₹ 2,50,000 - ₹ 15,00,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
 - 
					
						AI/ML Developer
4 days ago
Surat, Gujarat, India Trezix - The Future of Global Trade Full time ₹ 12,00,000 - ₹ 36,00,000 per yearExperience: 3+ years of experience in designing and developing AI solutions.Qualification: Bachelor s degree in Computer Science, Data Science, or a related field (e.g., Mathematics, Engineering)Location: Should be based out of Surat or ready to relocate to Surat, ASHINE, SVNIT Campus.Working Days: 6 days working with 2nd and 4th Saturday weekly offPosition...
 - 
					
Freelance AI Trainer
2 weeks ago
Surat, Gujarat, India Mindrift Full time ₹ 2,40,000 - ₹ 14,40,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe...