
Llm Reliability
2 weeks ago
ABOUT XENONSTACK XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems enabling enterprises to gain real-time and intelligent business insights We deliver innovation through Agentic Systems for AI Agents Vision AI Platform Inference AI Infrastructure for Agentic Systems Our mission is to accelerate the world s transition to AI Human Intelligence by making AI agents reliable explainable and enterprise-ready THE OPPORTUNITY We are seeking an LLM Reliability Evaluation Engineer to ensure that large language models LLMs and agentic AI systems meet enterprise-grade standards of accuracy safety and trustworthiness This role focuses on evaluating benchmarking and stress-testing LLMs in real-world workflows building frameworks for reliability robustness and continuous improvement If you thrive at the intersection of AI research applied testing and responsible deployment this is the role for you KEY RESPONSIBILITIES Evaluation Frameworks Design and implement LLM evaluation pipelines covering accuracy robustness safety and bias Develop automated systems for benchmarking models on enterprise-relevant tasks Reliability Engineering Conduct stress tests adversarial testing and edge-case evaluations Build tools to measure latency consistency and error recovery in multi-turn interactions Metrics Monitoring Define KPIs such as factual accuracy hallucination rate toxicity and compliance alignment Establish real-time monitoring for drift anomalies and performance regressions Collaboration Alignment Partner with ML engineers product managers and domain experts to align evaluation with business objectives Work with Responsible AI teams to implement ethical explainable and compliant evaluation practices Continuous Improvement Feed insights from evaluation into fine-tuning RLHF RLAIF pipelines and model selection Maintain a central repository of test cases benchmarks and evaluation results Research Innovation Stay current with state-of-the-art LLM evaluation techniques from academic benchmarks to applied enterprise metrics Explore automated evaluation using agentic test harnesses and synthetic data generation SKILLS QUALIFICATIONS Must-Have 3-6 years in AI ML NLP or applied model evaluation Strong understanding of LLM architectures prompt engineering and failure modes Hands-on with evaluation frameworks Eval harnesses Ragas OpenAI Evals DeepEval Proficiency in Python and libraries like LangChain LangGraph LlamaIndex Hugging Face Experience with vector databases RAG pipelines and knowledge graph integration Familiarity with bias fairness testing and Responsible AI frameworks Good-to-Have Experience with reinforcement learning RLHF RLAIF and reward modeling Exposure to agentic evaluation frameworks multi-agent stress testing synthetic user simulators Knowledge of compliance and safety requirements for BFSI GRC or SOC use cases Contributions to open-source evaluation libraries or research papers WHY SHOULD YOU JOIN US Agentic AI Product Company Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption A Fast-Growing Category Leader Be part of one of the fastest-growing AI Foundries powering Fortune 500 enterprises with trustworthy AI Career Mobility Growth Grow into roles such as AI Systems Architect Responsible AI Engineer or Reliability Engineering Lead Global Exposure Work on enterprise-scale evaluation challenges across BFSI Healthcare Telecom and GRC Create Real Impact Your evaluations will directly shape production-grade AI agents used in mission-critical systems Culture of Excellence Our values Agency Taste Ownership Mastery Impatience and Customer Obsession empower you to innovate fearlessly Responsible AI First Join a company that prioritizes trustworthy explainable and compliant AI XENONSTACK CULTURE - JOIN US MAKE AN IMPACT At XenonStack we believe in shaping the future of intelligent systems We foster a culture of cultivation built on bold human-centric leadership principles where deep work simplicity and adoption define everything we do Our Cultural Values Agency - Be self-directed and proactive Taste - Sweat the details and build with precision Ownership - Take responsibility for outcomes Mastery - Commit to continuous learning and growth Impatience - Move fast and embrace progress Customer Obsession - Always put the customer first Our Product Philosophy Obsessed with Adoption - Making AI accessible reliable and enterprise-ready Obsessed with Simplicity - Turning complex evaluation challenges into seamless automated frameworks Be part of our mission to accelerate the world s transition to AI Human Intelligence by making AI agents not just powerful but trustworthy and reliable
-
LLM Reliability
2 weeks ago
Mohali, Punjab, India Xenonstack Full time US$ 1,04,000 - US$ 1,30,878 per yearJob InformationDate Opened09/04/2025Job TypeFull timeIndustryTechnologyWork Experience2-4 yearsCityMohaliState/ProvincePunjabCountryIndiaZip/Postal Code160075Job DescriptionABOUT XENONSTACKXenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights.We deliver innovation...
-
AI Reliability Specialist
2 weeks ago
Mohali, Punjab, India beBeeReliability Full time ₹ 1,80,00,000 - ₹ 2,16,00,000Job OpportunityXenonStack, a leading Data and AI Foundry for Agentic Systems, is seeking an LLM Reliability Evaluation Engineer to ensure the reliability of large language models (LLMs) and agentic AI systems.Key ResponsibilitiesEvaluate the reliability of large language models (LLMs)Assess the performance of agentic AI systemsThe ideal candidate will have...
-
LLM Engineer/ Prompt Engineer
2 weeks ago
Mohali, Punjab, India Girl Power Talk Full timeKey Responsibilities Design, develop, and deploy machine learning models to address real-world challenges. Build and optimize data pipelines for training, testing, and inference. Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques . Collaborate with product managers, engineers, and designers to...
-
LLM Engineer/ Prompt Engineer
1 week ago
Mohali, Punjab, India Girl Power Talk Full timeKey ResponsibilitiesDesign, develop, and deploy machine learning models to address real-world challenges.Build and optimize data pipelines for training, testing, and inference.Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.Collaborate with product managers, engineers, and designers to integrate AI...
-
LLM Engineer/ Prompt Engineer
1 week ago
Mohali, Punjab, India Girl Power Talk Full time ₹ 15,00,000 - ₹ 20,00,000 per yearKey ResponsibilitiesDesign, develop, and deploy machine learning modelsto address real-world challenges.- Build and optimize data pipelinesfor training, testing, and inference.- Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.- Collaborate with product managers, engineers, and designers to integrate AI...
-
LLM Engineer/ Prompt Engineer
2 weeks ago
Mohali, Punjab, India Girl Power Talk Full timeKey Responsibilities- Design, develop, and deploy machine learning models to address real-world challenges.- Build and optimize data pipelines for training, testing, and inference.- Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.- Collaborate with product managers, engineers, and designers to integrate AI...
-
Python Developer/Full Stack Developer
2 weeks ago
Mohali, Punjab, India Shopyvilla Full time ₹ 90,000 - ₹ 1,10,000 per yearDescription:We are seeking a skilled Python Developer to work on modern web and AI-integrated applications. The ideal candidate has experience building APIs, working with frontend frameworks, and integrating with AI services. You should be comfortable handling both backend logic and data transformation tasks.Key Responsibilities:Develop and maintain backend...
-
Sr. AI Engineer
4 days ago
Mohali, Punjab, India Chic Mic Studios Full time ₹ 20,00,000 - ₹ 32,00,000 per yearSenior AI EngineerLocation: Mohali, Punjab (Work From Office)Experience: 5+ YearsQualification: B.Tech / MCA / M.Tech / Equivalent in Computer Science or related fieldOverviewWe are seeking a highly skilled and experienced Senior AI Engineer to join our innovation-driven team at ChicMic Studios. The ideal candidate will have deep expertise in Python, AI/ML...
-
AI Technologist
2 weeks ago
Mohali, Punjab, India beBeeMachineLearning Full time ₹ 15,00,000 - ₹ 20,00,000Job DescriptionWe are seeking a skilled machine learning engineer to join our team. As an LLM Engineer/Prompt Engineer, you will design, develop and deploy machine learning models to address real-world challenges. This is an excellent opportunity for someone who wants to work on AI projects that merge technology with social impact.Key ResponsibilitiesDesign,...
-
AI Model Developer
1 week ago
Mohali, Punjab, India beBeeMachineLearning Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Key ResponsibilitiesWe design, develop and deploy machine learning models to address complex real-world challenges.We build and optimize data pipelines for training, testing and inference purposes.We train, evaluate and fine-tune models across supervised, unsupervised and deep learning techniques.We collaborate with product managers, engineers and designers...