NLP intent dataset across 8 Indian languages
An ed-tech company serving rural India
A balanced intent-and-entity dataset across 8 Indian languages — including code-mixed utterances common in real users — that delivered a 9.2% model F1 lift in production.
Challenge
What we walked into.
The product team's NLP needed to handle eight Indian languages, including code-mixed Hinglish and Tanglish utterances common in their rural-user base. Off-the-shelf intent data didn't reflect how their users actually spoke, and accuracy in those locales was dragging down the assistant's overall performance.
What we did
The work, step by step.
Defined an intent-and-entity schema with the product team and recruited native annotators in each language
Sourced and balanced 240k utterances across formal, informal, and code-mixed registers
Calibrated annotators on a held-out gold set per language and tracked inter-annotator agreement throughout
Delivered weekly batches with per-language QA reports tied back to model-error analysis
Results
What it shipped.
Outcomes measured against the brief we agreed up front, not vanity metrics.
- Utterances240k
- Languages8
- Model F1 lift+9.2%
More case studies
See allTelehealth app localized into 14 languages in under three weeks
We took a patient telehealth app from one English locale to fourteen in nineteen days — strict medical accuracy, full RTL support for Arabic, localized app-store listings, and a continuous-localization pipeline tied to the weekly release.
Read case study8-market e-commerce localization for a DTC brand
We localized 20k SKUs, the brand voice, the checkout, and every legal page across eight European markets — wiring transcreated campaign copy and a managed termbase into the same release as international SEO and paid marketing.
Read case studyE-learning platform localized (video + UI) into 12 languages
Hundreds of hours of course video subtitled and dubbed, the full product UI localized, and a terminology pipeline that holds technical accuracy across twelve languages — shipped to a strict course-launch schedule.
Read case studyNeed a smarter localization setup?
Get personalised guidance on the right approach for your content, data, and growth.
Talk to an expert