LLM Reliability

3 weeks ago


Sahibzada Ajit Singh Nagar, India XenonStack Full time
ABOUT XENONSTACK

XenonStack is the fastest-growing  Data and AI Foundry for Agentic Systems , enabling enterprises to gain  real-time and intelligent business insights .

We deliver innovation through:

  • Agentic Systems for AI Agents   →  akira.ai

  • Vision AI Platform   →  xenonstack.ai

  • Inference AI Infrastructure for Agentic Systems   →  nexastack.ai

Our mission is to accelerate the world’s transition to  AI + Human Intelligence   by making AI agents  reliable, explainable, and enterprise-ready .

THE OPPORTUNITY

We are seeking an  LLM Reliability & Evaluation Engineer   to ensure that large language models (LLMs) and agentic AI systems meet  enterprise-grade standards of accuracy, safety, and trustworthiness .

This role focuses on  evaluating, benchmarking, and stress-testing   LLMs in real-world workflows, building frameworks for  reliability, robustness, and continuous improvement . If you thrive at the intersection of  AI research, applied testing, and responsible deployment , this is the role for you.

KEY RESPONSIBILITIES
  • Evaluation Frameworks

    • Design and implement  LLM evaluation pipelines   covering accuracy, robustness, safety, and bias.

    • Develop automated systems for  benchmarking models   on enterprise-relevant tasks.

  • Reliability Engineering

    • Conduct  stress tests, adversarial testing, and edge-case evaluations .

    • Build tools to measure  latency, consistency, and error recovery   in multi-turn interactions.

  • Metrics & Monitoring

    • Define KPIs such as  factual accuracy, hallucination rate, toxicity, and compliance alignment .

    • Establish real-time monitoring for  drift, anomalies, and performance regressions .

  • Collaboration & Alignment

    • Partner with  ML engineers, product managers, and domain experts   to align evaluation with business objectives.

    • Work with Responsible AI teams to implement  ethical, explainable, and compliant evaluation practices .

  • Continuous Improvement

    • Feed insights from evaluation into  fine-tuning, RLHF/RLAIF pipelines, and model selection .

    • Maintain a  central repository of test cases, benchmarks, and evaluation results .

  • Research & Innovation

    • Stay current with  state-of-the-art LLM evaluation techniques , from academic benchmarks to applied enterprise metrics.

    • Explore  automated evaluation using agentic test harnesses and synthetic data generation .

SKILLS & QUALIFICATIONS

Must-Have

  • 3–6 years in  AI/ML, NLP, or applied model evaluation .

  • Strong understanding of  LLM architectures, prompt engineering, and failure modes .

  • Hands-on with  evaluation frameworks   (Eval harnesses, Ragas, OpenAI Evals, DeepEval).

  • Proficiency in  Python   and libraries like  LangChain, LangGraph, LlamaIndex, Hugging Face .

  • Experience with  vector databases, RAG pipelines, and knowledge graph integration .

  • Familiarity with  bias/fairness testing and Responsible AI frameworks .

Good-to-Have

  • Experience with  reinforcement learning (RLHF, RLAIF)   and reward modeling.

  • Exposure to  agentic evaluation frameworks   (multi-agent stress testing, synthetic user simulators).

  • Knowledge of  compliance and safety requirements   for BFSI, GRC, or SOC use cases.

  • Contributions to  open-source evaluation libraries or research papers .

WHY SHOULD YOU JOIN US?
  1. Agentic AI Product Company

    Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption.

  2. A Fast-Growing Category Leader

    Be part of one of the fastest-growing  AI Foundries , powering Fortune 500 enterprises with trustworthy AI.

  3. Career Mobility & Growth

    Grow into roles such as  AI Systems Architect, Responsible AI Engineer, or Reliability Engineering Lead .

  4. Global Exposure

    Work on  enterprise-scale evaluation challenges   across BFSI, Healthcare, Telecom, and GRC.

  5. Create Real Impact

    Your evaluations will directly shape  production-grade AI agents used in mission-critical systems .

  6. Culture of Excellence

    Our values —  Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession   — empower you to innovate fearlessly.

  7. Responsible AI First

    Join a company that prioritizes  trustworthy, explainable, and compliant AI .

XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT

At XenonStack, we believe in  shaping the future of intelligent systems . We foster a  culture of cultivation   built on bold, human-centric leadership principles, where  deep work, simplicity, and adoption   define everything we do.

Our Cultural Values

  • Agency   – Be self-directed and proactive.

  • Taste   – Sweat the details and build with precision.

  • Ownership   – Take responsibility for outcomes.

  • Mastery   – Commit to continuous learning and growth.

  • Impatience   – Move fast and embrace progress.

  • Customer Obsession   – Always put the customer first.

Our Product Philosophy

  • Obsessed with Adoption   – Making AI accessible, reliable, and enterprise-ready.

  • Obsessed with Simplicity   – Turning complex evaluation challenges into seamless, automated frameworks.

Be part of our mission to  accelerate the world’s transition to AI + Human Intelligence   — by making AI agents not just powerful, but  trustworthy and reliable .




  • LLM Reliability

    3 weeks ago


    Sahibzada Ajit Singh Nagar, India XenonStack Full time

    ABOUT XENONSTACK XenonStack is the fastest-growing  Data and AI Foundry for Agentic Systems , enabling enterprises to gain  real-time and intelligent business insights . We deliver innovation through: Agentic Systems for AI Agents   →  akira.ai Vision AI Platform   →  xenonstack.ai Inference AI Infrastructure for Agentic Systems   → ...

  • LLM Reliability

    1 week ago


    Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

    Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights. We Deliver Innovation Through - Agentic Systems for AI Agents akira.ai - Vision AI Platform xenonstack.ai - Inference AI Infrastructure for Agentic Systems nexastack.ai Our...

  • Matrix Marketers

    1 week ago


    Sahibzada Ajit Singh Nagar, India Matrix Marketers Full time

    Job Summary :We are looking for an AI/ML Engineer with 4 years of experience in designing, developing, and deploying machine learning and artificial intelligence solutions. The right candidate will have a solid background in algorithms, data pipelines, and model optimization, along with practical experience in production-level ML Responsibilities :- Design,...

  • Agentic AI Engineer

    1 week ago


    Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

    Job Description About XenonStack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time, intelligent business insights. We Deliver Innovation Through - Akira AI Building Agentic Systems for AI Agents - XenonStack Vision AI Vision AI Platform - NexaStack AI Inference AI Infrastructure for...

  • AI Expert

    3 weeks ago


    Sahibzada Ajit Singh Nagar, India RChilli Full time

    Location:  Mohali, PB Job Type:  Full-Time Exp:  Minimum of 2 years of experience in advanced AI development. Shift Timings: 12-10pm IST About RChilli RChilli is a leader in AI-driven HR technology, powering next-generation recruitment solutions globally. We thrive on innovation, agility, and a mission to revolutionize the way HR teams...

  • AI Expert

    3 weeks ago


    Sahibzada Ajit Singh Nagar, India RChilli Full time

    Location:  Mohali, PB Job Type:  Full-Time Exp:  Minimum of 2 years of experience in advanced AI development. Shift Timings: 12-10pm IST About RChilli RChilli is a leader in AI-driven HR technology, powering next-generation recruitment solutions globally. We thrive on innovation, agility, and a mission to revolutionize the way HR teams work with...


  • Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

    Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time and intelligent business insights. We Deliver Innovation Through - Akira AI Building Agentic Systems for AI Agents - XenonStack Vision AI Vision AI Platform - NexaStack AI Inference AI Infrastructure...

  • Node.js Developer

    2 weeks ago


    Nagar, Sahibzada Ajit Singh Nagar, India Delta4 Infotech Full time

    Job Description At Delta4 Infotech, we are building next-gen AI products like YourGPT, a powerful platform that helps businesses automate, engage, and scale using Generative AI. We are looking for Node.js Developer and you will be responsible for building and maintaining high-performance, scalable backend systems that serve as the backbone of our AI-driven...


  • Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

    Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time and intelligent business insights. We Deliver Innovation Through - Agentic Systems for AI Agents akira.ai - Vision AI Platform xenonstack.ai - Inference AI Infrastructure for Agentic Systems...


  • Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

    Job Description About Xenonstack XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights. - Agentic Systems for AI Agents: akira.ai - Vision AI Platform: xenonstack.ai - Inference AI Infrastructure for Agentic Systems: nexastack.ai THE OPPORTUNITY We are...