Site icon Towards AI

What’s in the Controversial Article That Forced Timnit Gebru Out of Google?

Author(s): Zoheb Abai

Opinion

What’s in the Controversial Article that Forced Timnit Gebru Out of Google?

On the Dangers of Stochastic Parrots — Summarized

First Authors: Emily M. Bender (Left), Timnit Gebru (Right)

In short — Authors have raised global awareness on recent NLP trends and have urged researchers, developers, and practitioners associated with language technology to take a holistic and responsible approach.

Where lies the Issue?

Most notable NLP trend — the ever-increasing size (based on its number of parameters and size of training data) of Language models (LMs) like BERT and its variants, T-NLG, GPT-2/3, etc. Language Models (LMs) are trained on string prediction tasks: that is, predicting the likelihood of a token (character, word, or string) given either its preceding context or its surrounding context (in bidirectional and masked LMs). Such systems are unsupervised while training, later fine-tuned for specific tasks, and, when deployed, take a text as input, commonly outputting scores or string predictions. Increasing the number of model params/larger architecture did not yield noticeable increases for LSTMs; however, Transformers have continuously benefited from it. This trend of increasingly large LMs can be expected to continue as long as they correlate with an increase in performance. Even the models like DistilBERT and ALBERT, which are reduced form of BERT using techniques such as knowledge distillation, quantization, etc., still rely on large quantities of data and significant computing resources.

Table 1: Overview of recent large language models

What are the Issues?

Environmental Costs

Source: Energy and Policy Considerations for Deep Learning in NLP

Financial Costs

Risks associated with Large Training Data

Risks due to misdirected Research Effort

(specifically around the application of LMs for tasks intended to test for Natural Language Understanding)

Risks and Harms of deploying LMs at Scale

Human language usage occurs between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents that they use language to convey, and who model each others’ mental states as they communicate. As such, human communication relies on the interpretation of implicit meaning conveyed between individuals. The fact that human-human communication is a jointly constructed activity is most clearly true in co-situated spoken or signed communication. Still, we use the same facilities for producing language that is intended for audiences not co-present with us (readers, listeners, watchers at a distance in time or space) and in interpreting such language when we encounter it. It must follow that even when we don’t know the person who generated the language we are interpreting, we build a partial model of who they are and what common ground we think they share with us and use this in interpreting their words.

Recommended Paths Ahead!

We should consider our research time and effort a valuable resource to be spent to the extent possible on research projects that are built towards a technological ecosystem whose benefits are at least evenly distributed. Each of the below mentioned approaches take time and are most valuable when applied early in the development process as part of a conceptual investigation of values and harms rather than a post hoc discovery of risks.

  1. Considering Environmental and Financial Impacts: We should consider the financial and environmental costs of model development upfront before deciding on a course of an investigation. The resources needed to train and tune state-of-the-art models stand to increase economic inequities unless researchers incorporate energy and compute efficiency in their model evaluations.
  2. Doing careful data curation and documentation: Significant time should be spent on assembling datasets suited for the tasks at hand rather than ingesting massive amounts of data from convenient or easily-scraped Internet sources. Simply turning to massive dataset size as a strategy for being inclusive of diverse viewpoints is doomed to failure. As a part of careful data collection practices, researchers must adopt frameworks such as (Data Statements for Natural Language Processing, Datasheets for Datasets, Model Cards for Model Reporting) to describe the uses for which their models are suited and benchmark evaluations for a variety of conditions. This involves providing thorough documentation on the data used in model building, including the motivations underlying data selection and collection processes. This documentation should reflect and indicate researchers’ goals, values, and motivations in assembling data and creating a given model.
  3. Engaging with stakeholders early in the design process: It should note potential users and stakeholders, particularly those that stand to be negatively impacted by model errors or misuse. An exploration of stakeholders for likely use cases can still be informative around potential risks, even when there is no way to guarantee that all use cases can be explored.
  4. Exploring multiple possible paths towards long-term goals: We also advocate for a re-alignment of research goals: Where much effort has been allocated to making models (and their training data) bigger and to achieving ever higher scores on leaderboards often featuring artificial tasks, we believe there is more to be gained by focusing on understanding how machines are achieving the tasks in question and how they will form part of socio-technical systems. To that end, LM development may benefit from guided evaluation exercises such as pre-mortems.
  5. Keeping alert to dual-use scenarios: For researchers working with LMs, the value-sensitive design is poised to help throughout the development process in identifying whose values are expressed and supported through technology and, subsequently, how a lack of support might result in harm.
  6. Allocating research effort to mitigate harm: Finally, we would like to consider use cases of large LMs that have specifically served marginalized populations. We should consider cases such as: Could LMs be built in such a way that synthetic text generated with them would be watermarked and thus detectable? Are there policy approaches that could effectively regulate their use?

We hope these considerations encourage NLP researchers to direct resources and effort into techniques for approaching NLP tasks effectively without being endlessly data-hungry. But beyond that, we call on the field to recognize that applications that aim to mimic humans bring a risk of extreme harm believably. Work on synthetic human behavior is a bright line in ethical AI development, where downstream effects need to be understood and modeled to block foreseeable harm to society and different social groups.

Please find the complete article here and refer to its references not mentioned above. If this post saved your time, don’t forget to appreciate it.


What’s in the Controversial Article That Forced Timnit Gebru Out of Google? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Exit mobile version