Build Your Own RLHF LLM — Forget Human Labelers!

Author(s): Tim Cvetko

Originally published on Towards AI.

You know, that thing OpenAI used to make GPT3.5 into ChatGPT? You can do the same without asking strangers to rank statements.

I would never have put my finger that the next big revolution in AI would have happened on the text front. As an early adopter of the BERT models in 2017, I hadn’t exactly been convinced computers could interpret human language with similar granularity and contextuality as people do. Since then, 3 larger breakthroughs have formed the Textual Revolution:

Self-attention: the ability to learn contextual learning of sentences.Large Transformer Models(GPTs) — the ability to learn from massive corpora of data and build conversational awareness.Reinforcement Learning from Human Feedback(RLHF) — the ability to enhance LLM performance with human preference. However, this method is not easily replicable due to the extensive need for human labelers.

Forget Human Labelers!

Image by AuthorHow GPT-3.5 used RLHF to reinforce the LLM to make it ChatGPTComplete Code Walkthrough: Train Your Own RLHF ModelComplete Code Walkthrough: How to make the LLM Reinforce Itself Without Human Labelers, i.e Self-Play LLMs

Reinforcement learning from human feedback(RLHF) refers to using human labels as a reward policy the LLM uses to evaluate itself. Here’s how people act as judges:

Suppose we have a post from Reddit: “The cat is flying through the air”:Two summaries are selected for evaluationA human judge decides which is a better summary… Read the full blog for free on Medium.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Build Your Own RLHF LLM — Forget Human Labelers!

Author(s): Tim Cvetko

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Build Your Own RLHF LLM — Forget Human Labelers!

Author(s): Tim Cvetko

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement