LLM Reliability

5 hours ago


Sahibzada Ajit Singh Nagar, India XenonStack Full time
ABOUT XENONSTACK

XenonStack is the fastest-growing  Data and AI Foundry for Agentic Systems , enabling enterprises to gain  real-time and intelligent business insights .

We deliver innovation through:

  • Agentic Systems for AI Agents   →  akira.ai

  • Vision AI Platform   →  xenonstack.ai

  • Inference AI Infrastructure for Agentic Systems   →  nexastack.ai

Our mission is to accelerate the world’s transition to  AI + Human Intelligence   by making AI agents  reliable, explainable, and enterprise-ready .

THE OPPORTUNITY

We are seeking an  LLM Reliability & Evaluation Engineer   to ensure that large language models (LLMs) and agentic AI systems meet  enterprise-grade standards of accuracy, safety, and trustworthiness .

This role focuses on  evaluating, benchmarking, and stress-testing   LLMs in real-world workflows, building frameworks for  reliability, robustness, and continuous improvement . If you thrive at the intersection of  AI research, applied testing, and responsible deployment , this is the role for you.

KEY RESPONSIBILITIES
  • Evaluation Frameworks

    • Design and implement  LLM evaluation pipelines   covering accuracy, robustness, safety, and bias.

    • Develop automated systems for  benchmarking models   on enterprise-relevant tasks.

  • Reliability Engineering

    • Conduct  stress tests, adversarial testing, and edge-case evaluations .

    • Build tools to measure  latency, consistency, and error recovery   in multi-turn interactions.

  • Metrics & Monitoring

    • Define KPIs such as  factual accuracy, hallucination rate, toxicity, and compliance alignment .

    • Establish real-time monitoring for  drift, anomalies, and performance regressions .

  • Collaboration & Alignment

    • Partner with  ML engineers, product managers, and domain experts   to align evaluation with business objectives.

    • Work with Responsible AI teams to implement  ethical, explainable, and compliant evaluation practices .

  • Continuous Improvement

    • Feed insights from evaluation into  fine-tuning, RLHF/RLAIF pipelines, and model selection .

    • Maintain a  central repository of test cases, benchmarks, and evaluation results .

  • Research & Innovation

    • Stay current with  state-of-the-art LLM evaluation techniques , from academic benchmarks to applied enterprise metrics.

    • Explore  automated evaluation using agentic test harnesses and synthetic data generation .

SKILLS & QUALIFICATIONS

Must-Have

  • 3–6 years in  AI/ML, NLP, or applied model evaluation .

  • Strong understanding of  LLM architectures, prompt engineering, and failure modes .

  • Hands-on with  evaluation frameworks   (Eval harnesses, Ragas, OpenAI Evals, DeepEval).

  • Proficiency in  Python   and libraries like  LangChain, LangGraph, LlamaIndex, Hugging Face .

  • Experience with  vector databases, RAG pipelines, and knowledge graph integration .

  • Familiarity with  bias/fairness testing and Responsible AI frameworks .

Good-to-Have

  • Experience with  reinforcement learning (RLHF, RLAIF)   and reward modeling.

  • Exposure to  agentic evaluation frameworks   (multi-agent stress testing, synthetic user simulators).

  • Knowledge of  compliance and safety requirements   for BFSI, GRC, or SOC use cases.

  • Contributions to  open-source evaluation libraries or research papers .

WHY SHOULD YOU JOIN US?
  1. Agentic AI Product Company

    Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption.

  2. A Fast-Growing Category Leader

    Be part of one of the fastest-growing  AI Foundries , powering Fortune 500 enterprises with trustworthy AI.

  3. Career Mobility & Growth

    Grow into roles such as  AI Systems Architect, Responsible AI Engineer, or Reliability Engineering Lead .

  4. Global Exposure

    Work on  enterprise-scale evaluation challenges   across BFSI, Healthcare, Telecom, and GRC.

  5. Create Real Impact

    Your evaluations will directly shape  production-grade AI agents used in mission-critical systems .

  6. Culture of Excellence

    Our values —  Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession   — empower you to innovate fearlessly.

  7. Responsible AI First

    Join a company that prioritizes  trustworthy, explainable, and compliant AI .

XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT

At XenonStack, we believe in  shaping the future of intelligent systems . We foster a  culture of cultivation   built on bold, human-centric leadership principles, where  deep work, simplicity, and adoption   define everything we do.

Our Cultural Values

  • Agency   – Be self-directed and proactive.

  • Taste   – Sweat the details and build with precision.

  • Ownership   – Take responsibility for outcomes.

  • Mastery   – Commit to continuous learning and growth.

  • Impatience   – Move fast and embrace progress.

  • Customer Obsession   – Always put the customer first.

Our Product Philosophy

  • Obsessed with Adoption   – Making AI accessible, reliable, and enterprise-ready.

  • Obsessed with Simplicity   – Turning complex evaluation challenges into seamless, automated frameworks.

Be part of our mission to  accelerate the world’s transition to AI + Human Intelligence   — by making AI agents not just powerful, but  trustworthy and reliable .




  • LLM Reliability

    12 hours ago


    Sahibzada Ajit Singh Nagar, India XenonStack Full time

    ABOUT XENONSTACK XenonStack is the fastest-growing  Data and AI Foundry for Agentic Systems , enabling enterprises to gain  real-time and intelligent business insights . We deliver innovation through: Agentic Systems for AI Agents   →  akira.ai Vision AI Platform   →  xenonstack.ai Inference AI Infrastructure for Agentic Systems   → ...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key Responsibilities - Design, develop, and deploy machine learning models to address real-world challenges. - Build and optimize data pipelines for training, testing, and inference. - Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques. - Collaborate with product managers, engineers, and designers to...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key ResponsibilitiesDesign, develop, and deploy machine learning models to address real-world challenges.Build and optimize data pipelines for training, testing, and inference.Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.Collaborate with product managers, engineers, and designers to integrate AI...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key Responsibilities Design, develop, and deploy machine learning models to address real-world challenges. Build and optimize data pipelines for training, testing, and inference. Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques . Collaborate with product managers, engineers, and designers to...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key Responsibilities Design, develop, and deploy machine learning models to address real-world challenges. Build and optimize data pipelines for training, testing, and inference. Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques . Collaborate with product managers, engineers, and designers to...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key ResponsibilitiesDesign, develop, and deploy machine learning models to address real-world challenges.Build and optimize data pipelines for training, testing, and inference.Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.Collaborate with product managers, engineers, and designers to integrate AI...


  • Sahibzada Ajit Singh Nagar, India Girl Power Talk Full time

    Key ResponsibilitiesDesign, develop, and deploy machine learning models to address real-world challenges.Build and optimize data pipelines for training, testing, and inference.Train, evaluate, and fine-tune models across supervised, unsupervised, and deep learning techniques.Collaborate with product managers, engineers, and designers to integrate AI...

  • AI Expert

    9 hours ago


    Sahibzada Ajit Singh Nagar, India RChilli Full time

    Location:  Mohali, PB Job Type:  Full-Time Exp:  Minimum of 2 years of experience in advanced AI development. Shift Timings: 12-10pm IST About RChilli RChilli is a leader in AI-driven HR technology, powering next-generation recruitment solutions globally. We thrive on innovation, agility, and a mission to revolutionize the way HR teams work with...

  • AI Expert

    6 hours ago


    Sahibzada Ajit Singh Nagar, India RChilli Full time

    Location:  Mohali, PB Job Type:  Full-Time Exp:  Minimum of 2 years of experience in advanced AI development. Shift Timings: 12-10pm IST About RChilli RChilli is a leader in AI-driven HR technology, powering next-generation recruitment solutions globally. We thrive on innovation, agility, and a mission to revolutionize the way HR teams...

  • Lead AI Engineer

    9 hours ago


    Sahibzada Ajit Singh Nagar, India HRS Full time

    HRS AS A COMPANY HRS, a pioneer in business travel, aims to elevate every stay through innovative technology. With over years of experience, their digital platform, driven by ProcureTech, TravelTech, and FinTech, transforms how companies and travelers Stay, Work, and Pay. ProcureTech digitally revolutionizes lodging procurement, connecting corporations...