4 Things to Know about Large Language Models

Last Updated on July 25, 2023 by Editorial Team

Author(s): Harshit Sharma

Originally published on Towards AI.

Amidst the LLM hype, there are interesting “things” to know about the LLMs, as mentioned in a recent paper by Samuel from Anthropic:

The paper can be found at https://arxiv.org/pdf/2304.00612.pdf

There are 8 things that the paper discusses but the 4 mentioned here are the best ones. Recommend giving a nice quick read for the paper.

Let's get started !!

The focus is on “WITHOUT TARGETED INNOVATION”

A perfect testament to this statement is the development of the GPT family of models by OpenAI, starting from GPT, GPT2, and GPT3.

The thing to note here is that the design of these 3 models hardly differs at all. It's the infrastructural innovations in high-performance computing rather than model-design that made the later versions of the GPT family possible and better in terms of performance.

(Image by Author) The scale of Compute Resources vs Design Innovation

The scaling laws have been immensely helpful in precisely predicting the potential gains in the performance of larger LLMs without actually training the LLMs and burning millions of $.

For example, scaling laws allowed the creators of GPT-4 to cheaply and accurately predict a measure of its performance consuming just 0.1% of the resources needed by the final model.

Scaling laws as discussed above can only be useful in predicting the “pretraining test loss”, but not the specific tasks or skills that the model will be good at. Developers can get confident that the model will be better, but only god knows what actually.

This means, scaling law-style predictions are not unreliable when it comes to predicting the skills the trained model will possess.

So,

When a lab invests in developing a new LLM, they are essentially buying a “mystery box”

(Image by Author) We still don’t know what’s hiding beneath

Once again taking GPT as an example. GPT-3 is the first modern LLM to show

few-shot learning and
chain-of-thought reasoning capabilities

Fun fact — its few shot capabilities were not known until it was trained and its capacity for chain-of-thought reasoning was discovered only several months later once it was available to the public.

LLMs are created primarily to imitate human writing, but it has been obvious that these models can easily outperform humans on many tasks.

This is because:

LLMs are exposed to far more material than any human sees in his entire lifetime.
They are given additional training using Reinforcement Learning, enabling them to produce responses that humans find helpful without requiring humans to give behaviors demonstrations explicitly

(Image by Author) Massive Data + RL -> LLM beats humans

Increasingly capable models can recognize the circumstances that they were trained in, causing them to behave precisely in those situations, but rather unexpectedly in new circumstances. This problem surfaces in the form of :

Sycophancy: where a model answers subjective questions in a way that flatters its user’s stated beliefs
Sandbagging: where a model endorses even misconceptions when their user seems less educated. Creepy !!

Microsoft Bing Chat suffering from this problem displayed manipulative behavior in its early versions.

Here is what 36% of the 480 researchers said in one of the surveys:

Hope you enjoyed this quick read !!

Follow Intuitive Shorts (a Substack newsletter), to read quick and intuitive summaries of ML/NLP/DS concepts.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

4 Things to Know about Large Language Models

Author(s): Harshit Sharma

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Vector Databases 101: A Beginner’s Guide to Vector Search and Indexing

AI Agent Developer: A Journey Through Code, Creativity, and Curiosity

AlphaGeometry2: A Deep Dive into a Gold-Medalist AI Geometry Solver

Why you should try RAG before Finetuning a LLM?

DeepSeek AI — Beginner’s Guide

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

4 Things to Know about Large Language Models

Author(s): Harshit Sharma

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement