GlobalAnnotate
All case studies
AI AnnotationAI / LLM10-week programme

Multilingual RLHF dataset for an enterprise LLM

A Series-B foundation-model company

An 80k-prompt preference dataset across 8 languages — including red-team adversarial prompts and instruction-following evals — that lifted the model's downstream evals by 11.6%.

80k
Prompts
8
Languages
+11.6%
Eval lift

Challenge

What we walked into.

The model team needed preference data that captured real instruction-following nuance across eight languages, including coverage of adversarial and safety-critical prompts. Existing public datasets were thin outside English and inconsistent in quality.

What we did

The work, step by step.

  1. Designed the dataset schema with the research team — preference pairs, instruction-following labels, and adversarial prompts

  2. Recruited native annotators with subject-matter expertise in every target language and ran calibration on a held-out gold set

  3. Generated 80k prompts across 8 languages with a balanced mix of intent categories and difficulty bands

  4. Shipped per-language IAA reports and adjudicated edge cases with the research team in weekly review sessions

Results

What it shipped.

Outcomes measured against the brief we agreed up front, not vanity metrics.

  • Prompts
    80k
  • Languages
    8
  • Eval lift
    +11.6%

Need a smarter localization setup?

Get personalised guidance on the right approach for your content, data, and growth.

Talk to an expert

Start a project

Have a file or brief ready? Tell us your languages and timeline.

Start a project

Work we've delivered

See how teams use GlobalAnnotate to go global.

See case studies