AI Evaluation Engineer
2 days ago
Role Summary
We are seeking an
AI Evaluation Engineer
to join our team and help define how next-generation AI systems are tested, trusted, and improved. In this role, you'll design and implement rigorous quality assurance and evaluation frameworks—combining automated pipelines, human-in-the-loop review, and synthetic data generation—to measure not only our platform reliability but also AI agents' accuracy, safety, and alignment with real-world use cases. You'll work end-to-end across the product lifecycle: writing test case scenarios, building automated tests, managing release test plans, developing dashboards and analysis tools, and translating insights into actionable improvements for both internal teams and clients.
Key Responsibilities
●
Design evaluation frameworks
for accuracy, safety, fairness, and alignment with intended use cases.
●
Build and maintain evaluation pipelines
that combine automated systems, human-in-the-loop review, and synthetic data generation to test AI Agents' performance at scale.
●
Conduct failure mode and edge-case analysis
to surface weaknesses, risks, and unexpected behaviors in AI outputs.
●
Develop internal tools and dashboards
that make evaluation results transparent, reproducible, and actionable across engineering, research, and client teams.
●
Ensure evaluation datasets
are diverse, representative, and high-quality, minimizing bias while capturing real-world complexity.
●
Collaborate with researchers, engineers, and product stakeholders
to translate insights into prioritized improvements and product decisions.
●
Treat evaluation as a discipline of testing
—applying statistical rigor, reproducibility, and operational reliability across the AI lifecycle.
●
Ensure deployment readiness
by stress-testing agents for resilience, safety, and alignment in production-like environments.
●
Quality Assurance
ensure software is built to specifications. It is reliable, robust, secure, and ready for deployment. Create test cases, test plan, and bug reporting process for unit, regression, and UAT testing.
Qualifications & Skills
Required
● Strong software engineering skills, with proficiency in Python and familiarity with data pipelines, APIs, and evaluation tooling.
● Solid understanding of the machine learning lifecycle, including model training, testing, and deployment.
● Experience designing or implementing evaluation metrics, experiment design, or statistical analysis.
● Exposure to human-in-the-loop workflows, annotation systems, or synthetic data generation.
● Ability to conduct rigorous failure analysis and translate results into actionable insights.
● Clear, precise communication skills; able to present evaluation findings to technical and non-technical audiences.
Preferred
● Quality Assurance and Testing Experience
● Experience with LLMs, generative AI systems, or agentic workflows.
● Familiarity with fairness, bias detection, interpretability, or safety evaluation.
● Background in building dashboards, monitoring tools, or large-scale observability systems.
● Prior work with evaluation frameworks, testing suites, or reproducibility practices at scale.
● Comfort working end-to-end: from scoping evaluation goals to delivering deployment-ready results.
Seniority Levels / Variations
Depending on seniority (e.g., junior vs senior vs staff), responsibilities might scale to include:
● Owning or leading an evaluation strategy at a product or platform level.
● Mentoring others or managing QA teams
● Architecture of evaluation platforms.
● Setting standards for metrics, tools, and best practices across multiple product lines.
What We Offer / Why Join Us
● Opportunity to influence AI product quality, fairness, and trust at scale.
● Working with cutting-edge model architectures and AI tools.
● Collaborating with top researchers / engineers / product leaders.
● Flexibility / remote / collaborative environment (if applicable).
● Learning opportunities in safety, fairness, interpretability, and evaluation methodologies.
-
Diamond Evaluator
6 days ago
Surat, Gujarat, India Apt Resources Full time US$ 42,000 - US$ 84,000 per yearApt Resources is seeking an experienced Diamond Evaluator to join our client's mining operations team in Angola. The ideal candidate will bring deep expertise in evaluating, grading, and valuing rough diamonds, ensuring compliance with international standards and industry best practices. This role is key in maintaining the integrity and accuracy of the...
-
Diamond Evaluator
6 days ago
Surat, Gujarat, India Apt Resources Full time US$ 42,000 - US$ 84,000 per yearApt Resources is seeking an experienced Diamond Evaluator to join our client's mining operations team in Angola. The ideal candidate will bring deep expertise in evaluating, grading, and valuing rough diamonds, ensuring compliance with international standards and industry best practices. This role is key in maintaining the integrity and accuracy of the...
-
AI Engineer
4 weeks ago
Surat, Gujarat, India Techvy Corp Full timeLead AI Engineer & Intermediate AI Engineer (Regulatory Workflow Automation)Openings: 1 Lead + 1 IntermediateLocation: Bangalore (Hybrid — 2–3 days/week onsite)Category: AI Engineering – Regulatory & Compliance AutomationAbout the teamWe're building AI-powered, automated workflows for compliance-heavy business processes. You'll work closely with domain...
-
AI Engineer
3 hours ago
Surat, Gujarat, India Appstonelab Technologies Full time ₹ 5,00,000 - ₹ 15,00,000 per yearAI Engineers at AppStoneLab focus on building intelligent agentic workflows using large language models (LLMs) and tools like LangGraph, LangChain, and OpenAI APIs. You'll work closely with product and engineering teams to design prompt-based systems, automate tasks using LLMs, and create scalable AI-driven solutions. The ideal candidate is curious,...
-
Freelance AI Red Team Engineer
4 days ago
Surat, Gujarat, India Mindrift Full time ₹ 2,50,000 - ₹ 15,00,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
-
Freelance AI Trainer
4 days ago
Surat, Gujarat, India Mindrift Full time ₹ 2,40,000 - ₹ 14,40,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe...
-
Freelance AI Trainer
4 days ago
Surat, Gujarat, India Mindrift Full time ₹ 2,40,000 - ₹ 14,40,000 per yearThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe...
-
AI Engineer
6 days ago
Surat, Gujarat, India Whitestone Infotech Full time ₹ 4,00,000 - ₹ 4,80,000 per yearWe are looking for an AI Engineer with strong expertise in AI/ML, Natural Language Processing (NLP), model deployment, MLOps, and time series forecasting. The ideal candidate will design, develop, and deploy machine learning models, ensuring scalable and reliable ML systems.Key Responsibilities:Build and train AI/ML models for various applications.Develop...
-
ai/ml
6 days ago
Surat, Gujarat, India Artoon Solutions Private Limited Full time US$ 90,000 - US$ 1,20,000 per yearJob DescriptionWe are looking for an experiencedAI/ML cum Python Developerwith 2+ years of hands-on work in machine learning, Python development, and API integration. The ideal candidate should also have experience buildingAI agents—smart systems that can plan tasks, make decisions, and work independently using tools like LangChain, AutoGPT, or similar...
-
AI/ML Developer
6 days ago
Surat, Gujarat, India vasundhara infotech Full time ₹ 15,00,000 - ₹ 25,00,000 per yearResponsibilities:Design, implement, and deploy scalable machine learning models and AI algorithms.Collaborate with cross-functional teams to define use cases, collect requirements, and deliver AI-driven solutions.Conduct data preprocessing, feature engineering, and exploratory data analysis.Train, test, and optimize models for performance, scalability, and...