LLM Reliability
2 weeks ago
Date Opened
09/04/2025
Job Type
Full time
Industry
Technology
Work Experience
2-4 years
City
Mohali
State/Province
Punjab
Country
India
Zip/Postal Code
160075
Job DescriptionABOUT XENONSTACK
XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights.
We deliver innovation through:
- Agentic Systems for AI Agents
- Vision AI Platform
- Inference AI Infrastructure for Agentic Systems
Our mission is to accelerate the world's transition to AI + Human Intelligence by making AI agents reliable, explainable, and enterprise-ready.
THE OPPORTUNITYWe are seeking an LLM Reliability & Evaluation Engineer to ensure that large language models (LLMs) and agentic AI systems meet enterprise-grade standards of accuracy, safety, and trustworthiness.
This role focuses on evaluating, benchmarking, and stress-testing LLMs in real-world workflows, building frameworks for reliability, robustness, and continuous improvement. If you thrive at the intersection of AI research, applied testing, and responsible deployment, this is the role for you.
KEY RESPONSIBILITIESEvaluation Frameworks
Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias.
- Develop automated systems for benchmarking models on enterprise-relevant tasks.
Reliability Engineering
Conduct stress tests, adversarial testing, and edge-case evaluations.
- Build tools to measure latency, consistency, and error recovery in multi-turn interactions.
Metrics & Monitoring
Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment.
- Establish real-time monitoring for drift, anomalies, and performance regressions.
Collaboration & Alignment
Partner with ML engineers, product managers, and domain experts to align evaluation with business objectives.
- Work with Responsible AI teams to implement ethical, explainable, and compliant evaluation practices.
Continuous Improvement
Feed insights from evaluation into fine-tuning, RLHF/RLAIF pipelines, and model selection.
- Maintain a central repository of test cases, benchmarks, and evaluation results.
Research & Innovation
Stay current with state-of-the-art LLM evaluation techniques, from academic benchmarks to applied enterprise metrics.
- Explore automated evaluation using agentic test harnesses and synthetic data generation.
Must-Have
- 3–6 years in AI/ML, NLP, or applied model evaluation.
- Strong understanding of LLM architectures, prompt engineering, and failure modes.
- Hands-on with evaluation frameworks (Eval harnesses, Ragas, OpenAI Evals, DeepEval).
- Proficiency in Python and libraries like LangChain, LangGraph, LlamaIndex, Hugging Face.
- Experience with vector databases, RAG pipelines, and knowledge graph integration.
- Familiarity with bias/fairness testing and Responsible AI frameworks.
Good-to-Have
- Experience with reinforcement learning (RLHF, RLAIF) and reward modeling.
- Exposure to agentic evaluation frameworks (multi-agent stress testing, synthetic user simulators).
- Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases.
- Contributions to open-source evaluation libraries or research papers.
- Agentic AI Product Company
Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption.
- A Fast-Growing Category Leader
Be part of one of the fastest-growing AI Foundries, powering Fortune 500 enterprises with trustworthy AI.
- Career Mobility & Growth
Grow into roles such as AI Systems Architect, Responsible AI Engineer, or Reliability Engineering Lead.
- Global Exposure
Work on enterprise-scale evaluation challenges across BFSI, Healthcare, Telecom, and GRC.
- Create Real Impact
Your evaluations will directly shape production-grade AI agents used in mission-critical systems.
- Culture of Excellence
Our values — Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession — empower you to innovate fearlessly.
- Responsible AI First
Join a company that prioritizes trustworthy, explainable, and compliant AI.
XENONSTACK CULTURE – JOIN US & MAKE AN IMPACTAt XenonStack, we believe in shaping the future of intelligent systems. We foster a culture of cultivation built on bold, human-centric leadership principles, where deep work, simplicity, and adoption define everything we do.
Our Cultural Values
- Agency – Be self-directed and proactive.
- Taste – Sweat the details and build with precision.
- Ownership – Take responsibility for outcomes.
- Mastery – Commit to continuous learning and growth.
- Impatience – Move fast and embrace progress.
- Customer Obsession – Always put the customer first.
Our Product Philosophy
- Obsessed with Adoption – Making AI accessible, reliable, and enterprise-ready.
- Obsessed with Simplicity – Turning complex evaluation challenges into seamless, automated frameworks.
Be part of our mission to accelerate the world's transition to AI + Human Intelligence — by making AI agents not just powerful, but trustworthy and reliable.
-
LLM Engineer/ Prompt Engineer
1 week ago
Mohali, Punjab, India Girl Power Talk Full time ₹ 15,00,000 - ₹ 25,00,000 per yearKey ResponsibilitiesDesign, develop, and deploy machine learning modelsto address real-world challenges.- Build and optimize data pipelinesfor training, testing, and inference.- Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.- Collaborate with product managers, engineers, and designers to integrate AI...
-
Sr. AI Engineer
4 days ago
Mohali, Punjab, India Chic Mic Studios Full time ₹ 16,00,000 - ₹ 32,00,000 per yearSenior AI EngineerLocation: Mohali, Punjab (Work From Office)Experience: 5+ YearsQualification: B.Tech / MCA / M.Tech / Equivalent in Computer Science or related fieldOverviewWe are seeking a highly skilled and experienced Senior AI Engineer to join our innovation-driven team at ChicMic Studios. The ideal candidate will have deep expertise in Python, AI/ML...
-
Python Developer/Full Stack Developer
2 weeks ago
Mohali, Punjab, India Shopyvilla Full time ₹ 13,20,000 per yearDescription:We are seeking a skilled Python Developer to work on modern web and AI-integrated applications. The ideal candidate has experience building APIs, working with frontend frameworks, and integrating with AI services. You should be comfortable handling both backend logic and data transformation tasks.Key Responsibilities:Develop and maintain backend...
-
AI Researcher
4 days ago
Mohali, Punjab, India Xenonstack Full time ₹ 5,00,000 - ₹ 25,00,000 per yearJob InformationDate Opened10/06/2025Job TypeFull timeIndustryTechnologyWork Experience1-3 yearsCityMohaliState/ProvincePunjabCountryIndiaZip/Postal Code160075Job DescriptionABOUT XENONSTACKXenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time, intelligent business insights and operational...
-
Data Architect
2 weeks ago
Mohali, Punjab, India Xenonstack Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJob InformationDate Opened08/27/2025Job TypeFull timeIndustryTechnologyWork Experience9 - 12 YearsCityMohaliState/ProvincePunjabCountryIndiaZip/Postal Code160075Job DescriptionABOUT XENONSTACKXenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights.We deliver...
-
Product Lead Engineer
1 week ago
Mohali, Punjab, India Javin Global Full time ₹ 25,00,000 - ₹ 35,00,000 per yearEducation: Bachelor's in Computer Science, IT, or related fieldExperience: 5+ years in software engineering, including 2+ years in technical leadershipTechnical Skills:Scalable system design & product architectureMEAN/MERN stack expertiseBackend API development & UI integrationCloud (AWS), Docker/KubernetesAI/ML familiarityGitLab/GitHub, Notion, Agile...
-
AI Engineer
1 week ago
Mohali, Punjab, India Relinns Technologies Full time ₹ 3,00,000 - ₹ 8,00,000 per yearThe Role-As an AI Engineer, you will be responsible for building and optimizing AI-first solutions that power BotPenguin's conversational and Agentic capabilities. You will work on LLM integrations, NLP pipelines, and machine learning models, while collaborating with cross-functional teams to deliver intelligent experiences at scale.This is a high-impact...
-
Agentic AI Engineer
1 week ago
Mohali, Punjab, India Xenonstack Full time ₹ 5,00,000 - ₹ 15,00,000 per yearJob InformationDate Opened08/26/2025Job TypeFull timeIndustryTechnologyWork Experience1-3 yearsCityMohaliState/ProvincePunjabCountryIndiaZip/Postal Code160075Job DescriptionABOUT XENONSTACKXenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time and intelligent business insights.We deliver...
-
Developer
2 days ago
Mohali, Punjab, India Delta4 Infotech Full time ₹ 9,00,000 - ₹ 12,00,000 per yearAtDelta4 Infotech, we are building next-gen AI products likeYourGPT, a powerful platform that helps businesses automate, engage, and scale using Generative AI.We are looking for Developerand you will be responsible for building and maintaining high-performance, scalable backend systems that serve as the backbone of our AI-driven products.You will work...
-
Python Developer
1 week ago
Mohali, Punjab, India Research Infinite Solutions LLP Full time ₹ 4,20,000 - ₹ 4,80,000 per yearMale applicants are preferredWe are looking for an enthusiastic and proactive Python Developer with core Python expertise and hands-on experience in Generative AI (GenAI) to join our development team.Experience Required: 2-3 YearsMode of Work: On-Site Only (Mohali, Punjab)Mode of Interview : Face to Face( On-Site)Contact for Queries: Mon–Fri, 11 AM – 6...