Increasing Robustness and Equity in NLP for Various English Dialects
Author(s): Eera Bhatt
Originally published on Towards AI.
Natural language processing (NLP) is a popular subfield of machine learning that enables computers to interpret and use human language to achieve certain tasks. To do this, we have to train the computer on text and speech data for it to learn patterns from the language and make predictions.
But letβs be honest for a second. There are still times when I donβt understand what people are saying during conversations, even when weβre both speaking English. Since the United States is full of immigrant families, itβs very common to face this great dialect barrier.
So when βEnglishβ means something a bit different to each of us, how can we expect machines to understand it?
Solution. Recently, a group of researchers from Stanford University and Georgia Tech released Multi-VALUE, a set of resources that are meant to improve equity in NLP for each of its dialects. Because of their hard work, NLP-based devices like translators and transcription services can be made more accurate across a diverse range of speakers.
Machines face the dialect barrier just like humans do. For instance, the aforementioned researchers noted that there were performance drops of up to 19% in a question-answering system for Singapore English. Most of these performance drops apply to low-resource languages.
Low-resource languages arenβt as widely present online as languages like English and Spanish are. Since technology is barely familiar with them, itβs hard for low-resource language speakers to be supported by it.
Multi-VALUE. As a solution to this issue, these researchers developed Multi-VALUE. They analyzed a catalog of data with decades of linguistic research, containing several features of different English dialects. Specifically, they looked at the grammatical role of each word in a sentence so they could rearrange it to align with different English variants.
For example, in Standard American English (SAE), the sentence βSofia was praised by her teacherβ translates to βSofia give her teacher praiseβ in Colloquial Singapore English (CSE). Notice how the entire meaning seems to change completely. See how easy it is to misunderstand either sentence?
Instead of just focusing on vocabulary in the data, these researchers focused on each sentenceβs grammatical structure, breaking down each one into smaller chunks. For example, βhe doesnβt have a catβ can be broken down into the negation in βdoesnβtβ that is connected to the verb βdoes.β
By analyzing grammar so meticulously like this, these researchers built a framework for linguistics that has almost equal performance across several different dialects of English.
Because of their work, people who speak low-resource languages can interact better with NLP-based devices without a huge communication gap. This makes the technology much more fair to people who donβt use Standard American English.
Limitations. At the same time, though, dialects donβt stay the same over time, and the researchers thoughtfully acknowledge this. There are so many specific features of each English variant that can change, even over just 5 to 10 years!
I know what youβre thinking. And youβre right, this shouldnβt be news to us. Today, we know these dialect changes are real because of words like βhighkeyβ and βgoatβ (greatest of all time, not the animal I used to visit at the farm.)
So if the various dialects of English are always changing, how do we go about training NLP models?
To address this limit, we could train a new model with new data every single time a dialect goes through some change. But doing so is very inefficient and takes up an unnecessary amount of time.
Adapters. Instead of this, Multi-VALUE lets people train one single model and augment it as time goes by. Instead of retraining the entire model for each dialect change, developers can train adapters. These are smaller parts of the model that can swap and adapt to a given dialect.
Adapters are commonly used for parameter-efficient and modular transfer learning. Parameter-efficient fine-tuning (PEFT) means we adjust only a few parameters in a pre-trained model while keeping most of its structure the same. This improves performance for large language models (LLMs).
Modular transfer learning means that we already have a pre-trained model, so we use it as a starting point to do a slightly different task with machine learning.
Since adapters are at the intersection of these two, theyβre known for being pretty easy and efficient to use. For Multi-VALUE in particular, we refer to these as Task-Agnostic Dialect Adapters (TADAs).
Robustness with TADAs. For testing, the TADA modules are stacked with task-specific adapters trained on Standard American English, and this improves the dialect performance without needing to train the model more. This helps the NLP model generalize much better to different scenarios of various English dialects, which improves its performance.
Itβs great to see the authors establish more equity in NLP for those who speak low-resource languages. Letβs see where this research goes in the future!
Further Reading:
[1] Held, W., Ziems, C. and Yang, D. (2023) βTADA: Task-Agnostic Dialect Adapters for Englishβ. arXiv.
[2] Kannan, P. (no date) Addressing Equity in Natural Language Processing of English Dialects, Stanford HAI. Available at: https://hai.stanford.edu/news/addressing-equity-natural-language-processing-english-dialects
[3] Poth, C. et al. (no date) Adapters: A unified library for parameter-efficient and β¦ Available at: https://aclanthology.org/2023.emnlp-demo.13.pdf
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI