QA Engineer III

2 days ago


Khammam, India Jobted IN C2 Full time

Overview: As a member of our Agentic AI Engineering team, we seek individuals passionate about advancing the quality and reliability of AI systems through rigorous evaluation, experimentation, and tooling. You will play a key role in assessing the performance, safety, and alignment of AI models, leveraging platforms like OpenAI and LangSmith. Our team thrives in a culture of curiosity, continuous learning, and collaborative innovation, where your contributions directly impact the trustworthiness of AI solutions used globally. Responsibilities: - Design and execute evaluation strategies for LLM-based systems using LangSmith, OpenAI tools, and custom frameworks. - Collaborate with product managers, researchers, and engineers to define evaluation metrics aligned with business and ethical goals. - Develop automated pipelines for regression testing, prompt evaluation, and model comparison. - Analyze model outputs for correctness, bias, safety, and hallucination using both qualitative and quantitative methods. - Build reusable test harnesses and datasets for benchmarking AI capabilities across tasks and domains. - Contribute to the development of internal tools for prompt engineering, evaluation visualization, and feedback loops. - Act as a quality advocate for AI-driven features across the product lifecycle. - Mentor junior team members in AI evaluation best practices and tooling. - Stay current with advancements in LLM evaluation, interpretability, and responsible AI. Qualifications: - Bachelor's or master's degree in computer science or a related technical field. - Minimum 6-8 years of experience into QA and Automation - Basic level understanding of software development lifecycle and quality best practices and tools - Good knowledge on Automation Framework Development - Proficient in the use of various automation frameworks like Playwright (preferred), Protractor, Rest Assured, karate etc. - Hands-on experience with OpenAI APIs (e.g., GPT-4, function calling, Assistants API). - Familiarity with OpenAI Evals and LangSmith for prompt orchestration and evaluation. - Strong Python skills, especially in building test automation and data analysis scripts. - Experience with prompt engineering, LLM fine-tuning, or RAG pipelines is a plus. - Proficiency in using tools like Jupyter, Pandas, and/or SQL for evaluation analytics. - Understanding of ethical AI principles, model alignment, and safety concerns. - Experience with version control (GitHub), CI/CD pipelines, and test management tools (e.g., JIRA, Azure Test Plans). - Excellent communication and collaboration skills, with the ability to explain complex AI behaviors to diverse stakeholders.