Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

On Stochastic Parrots: Paper Review
Artificial Intelligence   Cloud Computing   Latest   Machine Learning

On Stochastic Parrots: Paper Review

Last Updated on July 13, 2024 by Editorial Team

Author(s): Ayo Akinkugbe

Originally published on Towards AI.

Photo by David Clode on Unsplash

Introduction

A stochastic parrot is a metaphor often used to describe Artificial Intelligence, specifically language models. Parrots are known to mimic human language. Parrots learn to speak human language and then try to have conversations with humans, but do parrots understand what they speak? The same question can be asked about AI, specifically language models.

Whether we think this metaphor is accurate or not isn’t the point. The authors of the paper β€œOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” highlight the risks large language models pose to humanity’s safety as they become bigger and propose mitigation strategies AI researchers and practitioners can incorporate in the development of such models.

As described in the paper, language models are unsupervised systems that predict the likelihood of a token (a token is a character, word, or string) given either a preceding context or surrounding context. However, unlike smaller language models, large language models have more parameters and require larger training datasets. These properties pose a different set of risks in their development and implementation.

Risks Posed by Large Language Models

The risks posed by language model development can be delineated into four categories: Environmental Costs, Financial Costs, Bias Due to Training Data, and Opportunity Cost of Misdirected Research Efforts.

Environmental Costs

Large language models require significant computational resources for training, resulting in substantial energy consumption and carbon emissions. This environmental cost raises concerns about sustainability and contributes to the carbon footprint of AI technologies. For example, the average human is responsible for an estimated 5t CO2e per year. However, a Transformer model with neural architecture search during its training procedure was estimated to emit 284t of CO2. Another case in point: training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight.

The paper was published in 2021 and doesn’t account for the latest state-of-the-art LLMs like GPT-4 and Gemini. The salient part of the environmental costs is that they are paid for by marginalized communities who do not benefit from the technology developed financially or socially. The lowest-income countries in the world produce one-tenth of emissions but are the most heavily impacted by climate change. The environmental costs of large language models play out as a domino effect.

  • LLM model training causes high emissions.
  • Carbon emissions cause climate change.
  • Climate change effects are mostly experienced in low-income countries, thereby weighing more heavily on communities that do not benefit directly from these technologies.

Some examples highlighted in the research paper include the monsoons caused by changes in rainfall patterns in India due to climate change affecting more than 8 million people in India and fires in Australia killing or displacing nearly three billion animals and at least 400 people.

Financial Costs

One of the core ingredients for large language model development is compute. AI Compute is expensive. Financial costs erect barriers to entry, limiting who can contribute to AI research. The paper also highlights how this type of barrier can empower already existing systems of power and the majority. In terms of language development, this barrier limits who can contribute and therefore which languages can benefit the most from these technologies.

Training Data Risks

Large datasets are not synonymous with diverse datasets. Training datasets used to train large language models are not necessarily representative of how different people view the world. Data diversity and data size are not necessarily correlated. According to the paper, the internet, where most training data comes from, is not equally accessible to everyone. As of the writing of the paper, 67% of Reddit (used in the training of GPT-2) users in the United States are men, and 64% are between the ages of 18 and 29. Wikipedians were only 8.8–15% women or girls. This disparity in knowledge learned by LLMs could encode bias, causing them to absorb the dominant worldview from training data, amplifying bias that already exists in the real world.

Opportunity Costs of Misdirected Research Efforts

The authors pose an important question: if the goal of language technology is language understanding, is research actually focused on tracking this effort? The resources diverted to measuring β€œhow well” models perform on existing benchmarks might be better used for more effective implementation and deployment, including proper planning of the end-to-end lifecycle of model development.

Risk Mitigations

The highlight of the paper isn’t only in calling out risks but also proposing actionable strategies researchers and practitioners in the field could consider. Some of these strategies are paraphrased and delineated as nuggets below:

  • Move Slow, Don’t Break Things: A mindset of careful planning before building AI systems trained on datasets goes a long way in how LLMs are developed and deployed.
  • Plan, Plan, Plan: Carefully planning in all dimensions before building AI systems trained on datasets. This allows for Value Sensitive Design in the development of such models, which considers the people that might be affected by the implementation and development of such models.
  • Adopt Human-Centered Design: Adopt research and development techniques that center the people who stand to be adversely affected by the resulting technology. Incorporate Value Sensitive Design β€” an approach to designing technology that accounts for human values in a principled and comprehensive manner throughout the design process.
  • Leverage Scenario Planning: Making time in the research process for considering environmental impacts, careful data curation and documentation, engaging with stakeholders early in the design process, exploring multiple possible paths towards long-term goals, keeping alert to dual-use scenarios, and allocating research effort to harm mitigation in such cases.
  • Document Training Data: Documentation of data used in model training reflects intention and research goals, allowing for careful consideration of what goes into language models as training data.
  • Realign Goals for Research: Instead of focusing on higher scores on leaderboards, researchers and practitioners can focus on understanding how AI systems are achieving tasks and how they fit into socio-technical systems.
  • Run Experiments in Carbon-Friendly Regions: For example, Google collates a list that tracks which compute regions have low carbon emissions.
  • Consistently Report Energy and Carbon Metrics.
  • Consider Energy-Performance Trade-Offs Before Deploying Energy-Hungry Models.

Conclusion

Though the paper was written in 2021, AI safety is still a pertinent conversation today. As an observer, researcher, or practitioner in the AI space, what are your thoughts on the current state of AI safety and risks? Do you believe any of these mitigation strategies hold weight in helping?

If interested, you can read the paper β€œOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” here.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓