Multilingual RLHF dataset for an enterprise LLM
A Series-B foundation-model company
An 80k-prompt preference dataset across 8 languages — including red-team adversarial prompts and instruction-following evals — that lifted the model's downstream evals by 11.6%.
Challenge
What we walked into.
The model team needed preference data that captured real instruction-following nuance across eight languages, including coverage of adversarial and safety-critical prompts. Existing public datasets were thin outside English and inconsistent in quality.
What we did
The work, step by step.
Designed the dataset schema with the research team — preference pairs, instruction-following labels, and adversarial prompts
Recruited native annotators with subject-matter expertise in every target language and ran calibration on a held-out gold set
Generated 80k prompts across 8 languages with a balanced mix of intent categories and difficulty bands
Shipped per-language IAA reports and adjudicated edge cases with the research team in weekly review sessions
Results
What it shipped.
Outcomes measured against the brief we agreed up front, not vanity metrics.
- Prompts80k
- Languages8
- Eval lift+11.6%
More case studies
See allTelehealth app localized into 14 languages in under three weeks
We took a patient telehealth app from one English locale to fourteen in nineteen days — strict medical accuracy, full RTL support for Arabic, localized app-store listings, and a continuous-localization pipeline tied to the weekly release.
Read case study8-market e-commerce localization for a DTC brand
We localized 20k SKUs, the brand voice, the checkout, and every legal page across eight European markets — wiring transcreated campaign copy and a managed termbase into the same release as international SEO and paid marketing.
Read case studyE-learning platform localized (video + UI) into 12 languages
Hundreds of hours of course video subtitled and dubbed, the full product UI localized, and a terminology pipeline that holds technical accuracy across twelve languages — shipped to a strict course-launch schedule.
Read case studyNeed a smarter localization setup?
Get personalised guidance on the right approach for your content, data, and growth.
Talk to an expert