LLM Reliability

4 weeks ago

Nagar Sahibzada Ajit Singh Nagar India XenonStack Moments Full time

Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights. We Deliver Innovation Through - Agentic Systems for AI Agents akira.ai - Vision AI Platform xenonstack.ai - Inference AI Infrastructure for Agentic Systems nexastack.ai Our mission is to accelerate the world's transition to AI + Human Intelligence by making AI agents reliable, explainable, and enterprise-ready. THE OPPORTUNITY We are seeking an LLM Reliability & Evaluation Engineer to ensure that large language models (LLMs) and agentic AI systems meet enterprise-grade standards of accuracy, safety, and trustworthiness. This role focuses on evaluating, benchmarking, and stress-testing LLMs in real-world workflows, building frameworks for reliability, robustness, and continuous improvement. If you thrive at the intersection of AI research, applied testing, and responsible deployment, this is the role for you. Key Responsibilities - Evaluation Frameworks - Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias. - Develop automated systems for benchmarking models on enterprise-relevant tasks. - Reliability Engineering - Conduct stress tests, adversarial testing, and edge-case evaluations. - Build tools to measure latency, consistency, and error recovery in multi-turn interactions. - Metrics & Monitoring - Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment. - Establish real-time monitoring for drift, anomalies, and performance regressions. - Collaboration & Alignment - Partner with ML engineers, product managers, and domain experts to align evaluation with business objectives. - Work with Responsible AI teams to implement ethical, explainable, and compliant evaluation practices. - Continuous Improvement - Feed insights from evaluation into fine-tuning, RLHF/RLAIF pipelines, and model selection. - Maintain a central repository of test cases, benchmarks, and evaluation results. - Research & Innovation - Stay current with state-of-the-art LLM evaluation techniques, from academic benchmarks to applied enterprise metrics. - Explore automated evaluation using agentic test harnesses and synthetic data generation. Skills & Qualifications Must-Have - 36 years in AI/ML, NLP, or applied model evaluation. - Strong understanding of LLM architectures, prompt engineering, and failure modes. - Hands-on with evaluation frameworks (Eval harnesses, Ragas, OpenAI Evals, DeepEval). - Proficiency in Python and libraries like LangChain, LangGraph, LlamaIndex, Hugging Face. - Experience with vector databases, RAG pipelines, and knowledge graph integration. - Familiarity with bias/fairness testing and Responsible AI frameworks. Good-to-Have - Experience with reinforcement learning (RLHF, RLAIF) and reward modeling. - Exposure to agentic evaluation frameworks (multi-agent stress testing, synthetic user simulators). - Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases. - Contributions to open-source evaluation libraries or research papers. WHY SHOULD YOU JOIN US - Agentic AI Product Company Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption. - A Fast-Growing Category Leader Be part of one of the fastest-growing AI Foundries, powering Fortune 500 enterprises with trustworthy AI. - Career Mobility & Growth Grow into roles such as AI Systems Architect, Responsible AI Engineer, or Reliability Engineering Lead. - Global Exposure Work on enterprise-scale evaluation challenges across BFSI, Healthcare, Telecom, and GRC. - Create Real Impact Your evaluations will directly shape production-grade AI agents used in mission-critical systems. - Culture of Excellence Our values Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession empower you to innovate fearlessly. - Responsible AI First Join a company that prioritizes trustworthy, explainable, and compliant AI. XENONSTACK CULTURE JOIN US & MAKE AN IMPACT At XenonStack, we believe in shaping the future of intelligent systems. We foster a culture of cultivation built on bold, human-centric leadership principles, where deep work, simplicity, and adoption define everything we do. Our Cultural Values - Agency Be self-directed and proactive. - Taste Sweat the details and build with precision. - Ownership Take responsibility for outcomes. - Mastery Commit to continuous learning and growth. - Impatience Move fast and embrace progress. - Customer Obsession Always put the customer first. Our Product Philosophy - Obsessed with Adoption Making AI accessible, reliable, and enterprise-ready. - Obsessed with Simplicity Turning complex evaluation challenges into seamless, automated frameworks. Be part of our mission to accelerate the world's transition to AI + Human Intelligence by making AI agents not just powerful, but trustworthy and reliable.

Agentic AI Engineer

4 weeks ago

Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

Job Description About XenonStack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time, intelligent business insights. We Deliver Innovation Through - Akira AI Building Agentic Systems for AI Agents - XenonStack Vision AI Vision AI Platform - NexaStack AI Inference AI Infrastructure for...
▷ (Urgent Search) Robotics Engineer

4 weeks ago

Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time and intelligent business insights. We Deliver Innovation Through - Akira AI Building Agentic Systems for AI Agents - XenonStack Vision AI Vision AI Platform - NexaStack AI Inference AI Infrastructure...
Backend Developer(Node.js)

1 week ago

Nagar, Sahibzada Ajit Singh Nagar, India Delta4 Infotech Full time

Job Description Delta4 Infotech Pvt. Ltd., the team behind YourGPT, is building AI-driven solutions to help businesses automate and scale. We are looking for a Backend Developer(Node.js) to join our team in Mohali, who will be responsible for developing and maintain high-performance, scalable backend systems that power our AI-driven products. In this role,...
Matrix Marketers

4 weeks ago

Sahibzada Ajit Singh Nagar, India Matrix Marketers Full time

Job Summary :We are looking for an AI/ML Engineer with 4 years of experience in designing, developing, and deploying machine learning and artificial intelligence solutions. The right candidate will have a solid background in algorithms, data pipelines, and model optimization, along with practical experience in production-level ML Responsibilities :- Design,...
Bluebash - Artificial Intelligence/Machine Learning Engineer - Python

2 weeks ago

Sahibzada Ajit Singh Nagar, India Bluebash Full time

Description : We are seeking a skilled AI/ML Engineer to design, develop, and optimize AI-driven solutions.The role requires expertise in machine learning algorithms, model training, and leveraging state-of-the-art tools to build scalable, efficient systems.The ideal candidate will have practical experience with frameworks like LangChain, Baby AGI,...
AI Interaction Engineer

4 weeks ago

Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling people and organizations to gain real-time and intelligent business insights. We Deliver Innovation Through - Agentic Systems for AI Agents akira.ai - Vision AI Platform xenonstack.ai - Inference AI Infrastructure for Agentic Systems...
(Apply in 3 Minutes) Director of Engineering

4 weeks ago

Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

Job Description About Xenonstack XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights. - Agentic Systems for AI Agents: akira.ai - Vision AI Platform: xenonstack.ai - Inference AI Infrastructure for Agentic Systems: nexastack.ai THE OPPORTUNITY We are...
Delivery Manager

4 weeks ago

Nagar, Sahibzada Ajit Singh Nagar, India XenonStack Moments Full time

Job Description About Xenonstack XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights. We Innovate Through - Akira AI Building Agentic Systems for AI Agents - XenonStack Vision AI Vision AI Platform - NexaStack AI Inference AI Infrastructure for Agentic Systems Our...
Principal LLM

3 days ago

India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per year

DescriptionWe are looking for a senior engineer who specializes in LLM systems, prompt engineering, and agentic application deployment, combined with strong MLOps and cloud platform engineering experience.You will design, deploy, and scale Generative AI models, retrieval-augmented generation (RAG) pipelines, and autonomous agent frameworks on OCI.In this...
Senior LLM Engineer

4 weeks ago

Bengaluru, Karnataka, India, Karnataka RingCentral Full time

Job Description:We are seeking an experienced AI Engineer with a strong background in Natural Language Understanding (NLU) who is passionate about pushing the boundaries of Conversational AI. In this role, you will design, develop, and deploy scalable AI solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), and prompt engineering techniques to...

Americas

Europe

Asia / Oceania

Africa

LLM Reliability