AI AnnotationEducation9-week programme

NLP intent dataset across 8 Indian languages

An ed-tech company serving rural India

A balanced intent-and-entity dataset across 8 Indian languages — including code-mixed utterances common in real users — that delivered a 9.2% model F1 lift in production.

240k

Utterances

Languages

+9.2%

Model F1 lift

This case study is part of our AI Annotation work — see how the same approach scales for other teams in our portfolio.

Challenge

What we walked into.

The product team's NLP needed to handle eight Indian languages, including code-mixed Hinglish and Tanglish utterances common in their rural-user base. Off-the-shelf intent data didn't reflect how their users actually spoke, and accuracy in those locales was dragging down the assistant's overall performance.

What we did

The work, step by step.

Defined an intent-and-entity schema with the product team and recruited native annotators in each language

Sourced and balanced 240k utterances across formal, informal, and code-mixed registers

Calibrated annotators on a held-out gold set per language and tracked inter-annotator agreement throughout

Delivered weekly batches with per-language QA reports tied back to model-error analysis

Results

What it shipped.

Outcomes measured against the brief we agreed up front, not vanity metrics.

Utterances
240k
Languages
8
Model F1 lift
+9.2%

More case studies

See all

Clinical team reviewing patient data on a tablet for a telehealth application

Translation & Localization

Telehealth app localized into 14 languages in under three weeks

We took a patient telehealth app from one English locale to fourteen in nineteen days — strict medical accuracy, full RTL support for Arabic, localized app-store listings, and a continuous-localization pipeline tied to the weekly release.

Read case study

Translation & Localization

8-market e-commerce localization for a DTC brand

We localized 20k SKUs, the brand voice, the checkout, and every legal page across eight European markets — wiring transcreated campaign copy and a managed termbase into the same release as international SEO and paid marketing.

Read case study

Laptop, notebook and glasses on a desk representing an online learning platform

Translation & Localization

E-learning platform localized (video + UI) into 12 languages

Hundreds of hours of course video subtitled and dubbed, the full product UI localized, and a terminology pipeline that holds technical accuracy across twelve languages — shipped to a strict course-launch schedule.

Read case study

Ready to grow globally?

Tell us about your project and we'll get back to you within one business day.

Talk to an expert info@globalannotate.com