
(15h Left) Senior LLM Evaluation
2 weeks ago
Job Description
Company Description
Genrise is a leading ecommerce content agent that specializes in identifying content gaps, creating high-performing product copy, and tailoring it for every marketplace. We deliver on-brand content for platforms like Amazon, Walmart, and Target, 10x faster. Our innovative approach ensures top-ranking content, making us a preferred choice for ecommerce businesses.
Role Description
We're looking for a hands-on technical expert who has actually written evals for large language models and has direct experience with reinforcement fine-tuning (e.g., RLHF, RLAIF, or RFT variants). You'll split your time between building/owning our LLM evaluation stack leveraging best practices in experimental design, measurement, and trustworthy deployment.
If you love turning fuzzy product goals into measurable evaluations, care deeply about scientific rigor, and enjoy building cool tech, this is for you.
What you'll do
- Design, implement, and maintain robust evaluation suites for LLMs (task- and domain-specific; regression and exploratory).
- Lead or contribute to reinforcement fine-tuning projects (reward modeling, preference data pipelines, safety/quality constraints, offline/online tuning loops).
- Define success metrics, sampling strategies, and statistical tests; ensure reproducibility and leakage prevention.
- Build data generation and curation pipelines for evals (human + synthetic), including rubric design and inter-annotator agreement.
- Partner with research, product, and infra to ship models with quantifiable improvements and clear trade-off documentation.
- Teach & mentor: run workshops, code walkthroughs, and evaluations office hours; raise the scientific bar across the org.
- Write clear experiment reports and decision memos; contribute to internal best-practice guides.
What we're looking for (must-haves)
- Recent, hands-on experience delivering 12+ real projects where you authored LLM evals end-to-end (design implementation analysis).
- Demonstrated experience with reinforcement fine-tuning for LLMs (RLHF/RLAIF/RFT)reward modeling, preference data, or policy optimization.
- Strong scientific foundation: experimental design, statistics, hypothesis testing, error analysis.
- Machine learning depth: transformers, tokenization, finetuning, sampling/decoding, data quality, overfitting/leakage controls.
- Proficiency with Python, PyTorch/JAX, and common LLM tooling (HF, vLLM, Triton, Ray/SLURM, Weights & Biases, etc.).
- Excellent written and verbal communication; proven ability to teach and mentor engineers/researchers.
Nice to have
- Safety evals, hallucination/robustness/red-teaming experience.
- Evaluation of tool use/agents, code generation, retrieval-augmented tasks.
- Knowledge of ranking/recommender systems or bandits.
- Infra for eval orchestration (sharding, caching, dataset versioning).
- Contributions to open-source eval frameworks or benchmark leaderboards.
-
Senior Software Engineer Python
1 week ago
Noida, India AquSag Technologies Full timeJob Description Location: Remote Employment Type: Contractual We are looking for experienced Python software engineers (tech lead level) who are familiar with high-quality public GitHub repositories and can contribute to this project. You should have experience working with well-maintained, widely-used repos with 5000+ stars. This role involves hands-on...
-
LLM Engineer_4+years
4 weeks ago
Pune, India Zorba AI Full timeJob Description Primary Title: Senior LLM Engineer (4+ years) Hybrid, India About The Opportunity A technology consulting firm operating at the intersection of Enterprise AI, Generative AI and Cloud Engineering seeks an experienced LLM-focused engineer. You will build and productionize LLM-powered products and integrations for enterprise customers across...
-
15h Left! Lead AI/ML
2 days ago
Pune, India Geektrust Full timeJob Description Opportunity to work with our client who is building the AI foundation for value-based healthcare we're hiring an AI/LLM Technical Lead to architect next-gen Agentic AI systems transforming how healthcare is delivered. Responsibilities: Lead architecture, design, and implementation of LLM-based and agentic AI systems for clinical and...
-
India Innodata Inc. Full timeJob Title: Visual Design Evaluator – AI/LLM Training Data Quality Location: Remote About the Role We are seeking meticulous and creatively attuned Visual Design Evaluators to contribute to the training and refinement of large language models (LLMs). In this role, you will assess the quality of visual content data and provide expert-level feedback based...
-
India Innodata Inc. Full timeJob Title: Visual Design Evaluator – AI/LLM Training Data QualityLocation: RemoteAbout the RoleWe are seeking meticulous and creatively attuned Visual Design Evaluators to contribute to the training and refinement of large language models (LLMs). In this role, you will assess the quality of visual content data and provide expert-level feedback based on...
-
▷ 15h Left! QA/Red Teaming Expert
2 weeks ago
India Innodata Inc. Full timeJob Description: We are seeking highly analytical and detail-oriented professionals with hands-on experience in Red Teaming, Prompt Evaluation, and AI/LLM Quality Assurance. The ideal candidate will help us rigorously test and evaluate AI-generated content to identify vulnerabilities, assess risks, and ensure compliance with safety, ethical, and quality...
-
Senior Data Scientist LLM
3 weeks ago
India ANSR Full timeANSR is hiring for one of its clients. About 4flow: Headquartered in Berlin, Germany, 4flow provides consulting, software and services for logistics and supply chain management. More than 1300 team members leverage their supply chain expertise and IT know-how to best serve their customers at 20+ locations around the world. 4flow develops and implements...
-
▷ [3 Days Left] WebFlow Freelancer
4 weeks ago
India INDIC EMS Electronics Pvt. Ltd. Full timeINDIC EMS Electronics Pvt. Ltd. needs a Webflow pro to upgrade UI/UX and boost SEO You’ll work on: • Layouts, components, interactions • CMS/templates + scalable styles • Technical SEO (meta, schema, sitemaps, hreflang, canonicals) • Speed & CWV fixes (images, assets, redirects) • GSC/GA4/GTM setup + content updates You have: • Strong...
-
Senior Data Scientist
4 weeks ago
Pune, India Ajni Consulting Private Limited Full timeJob Description : Role : Senior Data : : 8 to :- Perform in-depth data analysis to extract actionable insights, improve model performance, and inform AI strategy.- Develop, fine-tune, and deploy state-of-the-art LLMs and NLP models for real-world applications, ensuring robustness, efficiency, and scalability.- Implement, evaluate, and maintain LLM and...
-
India Innodata Inc. Full timeJob Title: Visual Design Evaluator – AI/LLM Training Data QualityLocation: RemoteAbout the RoleWe are seeking meticulous and creatively attuned Visual Design Evaluators to contribute to the training and refinement of large language models (LLMs). In this role, you will assess the quality of visual content data and provide expert-level feedback based on...