On Stochastic Parrots: Paper Review
Last Updated on July 13, 2024 by Editorial Team
Author(s): Ayo Akinkugbe
Originally published on Towards AI.
Introduction
A stochastic parrot is a metaphor often used to describe Artificial Intelligence, specifically language models. Parrots are known to mimic human language. Parrots learn to speak human language and then try to have conversations with humans, but do parrots understand what they speak? The same question can be asked about AI, specifically language models.
Whether we think this metaphor is accurate or not isnβt the point. The authors of the paper βOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big?β highlight the risks large language models pose to humanityβs safety as they become bigger and propose mitigation strategies AI researchers and practitioners can incorporate in the development of such models.
As described in the paper, language models are unsupervised systems that predict the likelihood of a token (a token is a character, word, or string) given either a preceding context or surrounding context. However, unlike smaller language models, large language models have more parameters and require larger training datasets. These properties pose a different set of risks in their development and implementation.
Risks Posed by Large Language Models
The risks posed by language model development can be delineated into four categories: Environmental Costs, Financial Costs, Bias Due to Training Data, and Opportunity Cost of Misdirected Research Efforts.
Environmental Costs
Large language models require significant computational resources for training, resulting in substantial energy consumption and carbon emissions. This environmental cost raises concerns about sustainability and contributes to the carbon footprint of AI technologies. For example, the average human is responsible for an estimated 5t CO2e per year. However, a Transformer model with neural architecture search during its training procedure was estimated to emit 284t of CO2. Another case in point: training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight.
The paper was published in 2021 and doesnβt account for the latest state-of-the-art LLMs like GPT-4 and Gemini. The salient part of the environmental costs is that they are paid for by marginalized communities who do not benefit from the technology developed financially or socially. The lowest-income countries in the world produce one-tenth of emissions but are the most heavily impacted by climate change. The environmental costs of large language models play out as a domino effect.
- LLM model training causes high emissions.
- Carbon emissions cause climate change.
- Climate change effects are mostly experienced in low-income countries, thereby weighing more heavily on communities that do not benefit directly from these technologies.
Some examples highlighted in the research paper include the monsoons caused by changes in rainfall patterns in India due to climate change affecting more than 8 million people in India and fires in Australia killing or displacing nearly three billion animals and at least 400 people.
Financial Costs
One of the core ingredients for large language model development is compute. AI Compute is expensive. Financial costs erect barriers to entry, limiting who can contribute to AI research. The paper also highlights how this type of barrier can empower already existing systems of power and the majority. In terms of language development, this barrier limits who can contribute and therefore which languages can benefit the most from these technologies.
Training Data Risks
Large datasets are not synonymous with diverse datasets. Training datasets used to train large language models are not necessarily representative of how different people view the world. Data diversity and data size are not necessarily correlated. According to the paper, the internet, where most training data comes from, is not equally accessible to everyone. As of the writing of the paper, 67% of Reddit (used in the training of GPT-2) users in the United States are men, and 64% are between the ages of 18 and 29. Wikipedians were only 8.8β15% women or girls. This disparity in knowledge learned by LLMs could encode bias, causing them to absorb the dominant worldview from training data, amplifying bias that already exists in the real world.
Opportunity Costs of Misdirected Research Efforts
The authors pose an important question: if the goal of language technology is language understanding, is research actually focused on tracking this effort? The resources diverted to measuring βhow wellβ models perform on existing benchmarks might be better used for more effective implementation and deployment, including proper planning of the end-to-end lifecycle of model development.
Risk Mitigations
The highlight of the paper isnβt only in calling out risks but also proposing actionable strategies researchers and practitioners in the field could consider. Some of these strategies are paraphrased and delineated as nuggets below:
- Move Slow, Donβt Break Things: A mindset of careful planning before building AI systems trained on datasets goes a long way in how LLMs are developed and deployed.
- Plan, Plan, Plan: Carefully planning in all dimensions before building AI systems trained on datasets. This allows for Value Sensitive Design in the development of such models, which considers the people that might be affected by the implementation and development of such models.
- Adopt Human-Centered Design: Adopt research and development techniques that center the people who stand to be adversely affected by the resulting technology. Incorporate Value Sensitive Design β an approach to designing technology that accounts for human values in a principled and comprehensive manner throughout the design process.
- Leverage Scenario Planning: Making time in the research process for considering environmental impacts, careful data curation and documentation, engaging with stakeholders early in the design process, exploring multiple possible paths towards long-term goals, keeping alert to dual-use scenarios, and allocating research effort to harm mitigation in such cases.
- Document Training Data: Documentation of data used in model training reflects intention and research goals, allowing for careful consideration of what goes into language models as training data.
- Realign Goals for Research: Instead of focusing on higher scores on leaderboards, researchers and practitioners can focus on understanding how AI systems are achieving tasks and how they fit into socio-technical systems.
- Run Experiments in Carbon-Friendly Regions: For example, Google collates a list that tracks which compute regions have low carbon emissions.
- Consistently Report Energy and Carbon Metrics.
- Consider Energy-Performance Trade-Offs Before Deploying Energy-Hungry Models.
Conclusion
Though the paper was written in 2021, AI safety is still a pertinent conversation today. As an observer, researcher, or practitioner in the AI space, what are your thoughts on the current state of AI safety and risks? Do you believe any of these mitigation strategies hold weight in helping?
If interested, you can read the paper βOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big?β here.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI