DIAYN: Diversity Is All You Need

Last Updated on July 20, 2023 by Editorial Team

Author(s): Sherwin Chen

Originally published on Towards AI.

Diving Into DIAYN U+007C Towards AI

An Unsupervised Information-Based Method to Learn Diverse Skills

Different skills learned by DIAYN without any extrinsic reward signal. Source: https://sites.google.com/view/diayn

Introduction

We discuss an information-based reinforcement learning method that explores the environment by learning diverse skills without the supervision of extrinsic rewards. In a nutshell, the method, namely DIAYN(Diversity Is All You Need), establishes the diversity of skills through an information-theoretic objective, and optimizes it using a maximum entropy reinforcement learning(MaxEnt RL) algorithm(e.g., SAC). Despite its simplicity, this method has been demonstrated to be able to learn diverse skills, such as walking and jumping, on a variety of simulated robotic tasks. Moreover, it is able to solve a number of RL benchmark tasks even without receiving any task reward. For more interesting experimental results, please refer to their project website.

Preliminaries

We define a Markov Decision Process(MDP) by the tuple (S, A, P, r, γ), where S is the set of state, A is the set of actions, P: S × A → S is the transition function, r is the reward function, and γ is the discounted factor. An RL algorithm aims to maximize the discounted sum of the expected rewards defined as

In DIAYN, we do not consider the reward signal from the environment. Instead, we define task-independent rewards based on information theory as we will see soon.

Diversity Is All You Need

In DIAYN, we define a skill as a latent-conditional policy that alters the state of the environment in a consistent way. Mathematically, a skill is denoted by the conditional policy p(aU+007Cs, z), where z is a latent variable sampled from some distribution p(z). The method is mainly built upon three ideas (For those not familiar with the concept of mutual information, I recommend referring to the second section of this post to gain some intuition first):

For skills to be useful, we want the skill to dictate the states that the agent visits. Different skills should visit different states, and hence be distinguishable. To achieve this, we maximize the mutual information I(S; Z) between states S and skills Z.
We want to use states, not actions, to distinguish skills, because actions that do not affect the environment are not visible to an outside observer. This is done by minimizing the mutual information I(A; ZU+007CS) between actions A and skills Z given the state S.
We encourage exploration and incentivize the skills to be as diverse as possible by learning skills that act as randomly as possible. As done in maximum entropy reinforcement learning, this is achieved by maximizing the policy entropy H(AU+007CS).

If we put together all three objectives, we will get

We now develop some intuitions on each term. The first term encourages our prior distribution p(z) to have high entropy. For a fixed set of skills, we set p(z) to be a discrete uniform distribution guaranteeing that it has maximum entropy. Minimizing the second term suggests that it should be easy to infer the skill from the current state. The third term indicates that each skill should act as randomly as possible.

We can easily maximize the third term with some MaxEnt RL method(e.g. SAC with temperature 0.1 used in their experiments). As for the first two terms, the authors propose incorporating them into a pseudo-reward:

where a learned discriminator q_ϕ(zU+007Cs) is used to approximate p(zU+007Cs), which is valid since

Note that the constant log p(z) in the reward function helps encourage the agent to stay alive if q_ϕ(zU+007Cs) ≥ p(z), which should always be held when the agent succeeds learning the skill p(aU+007Cs, z). On the other hand, removing log p(z) results in negative rewards, which tempts the agent to end the episode as quickly as possible.

Algorithm

Until now we have defined the unsupervised MDP and specified the reinforcement learning method, it is easy to figure out the whole algorithm:

Incorporating DIAYN into Hierarchical Reinforcement Learning

Networks learned by DIAYN can be used to initialize a task-specific agent, which provides a good way for initial exploration. Another interesting application of DIAYN is to use the learned skills as low-level policies of a Hierarchical Reinforcement Learning(HRL) algorithm. To do so, we further learn a meta-controller that chooses which skill to execute for the next k steps. The meta-controller has the same observation space as the skills and aims to maximize the task reward.

The authors experiment with the HRL algorithm on two challenging simulated robotics environments. On the cheetah hurdle task, the agent is rewarded for bounding up and over hurdles, while in the ant navigation task, the agent must walk to a set of 5 waypoints in a specific order, receiving only a sparse reward upon reaching each waypoint. The following figure demonstrates how DIAYN outperforms some state-of-the-art RL methods.

It is worth noting that plain DIAYN struggles on the ant navigation task like the others. This can be remedied by incorporating some prior knowledge into the discriminator. Specifically, the discriminator instead takes as input f(s) that computes the agent’s center of mass and the HRL method is left as it is. ‘DIAYN+prior’ shows this simple modification to the discriminator significantly improves the performance.

References

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine. Diversity is All You Need: Learning Skills without a Reward Function. Presented at ICLR 2019.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

DIAYN: Diversity Is All You Need

Author(s): Sherwin Chen

Diving Into DIAYN U+007C Towards AI

An Unsupervised Information-Based Method to Learn Diverse Skills

Introduction

Preliminaries

Diversity Is All You Need

Algorithm

Incorporating DIAYN into Hierarchical Reinforcement Learning

References

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Diffusion Auto-Regressive Transformer For Effective Self-Supervised Time Series Forecasting

5 Smart Ways to Use Retrieval-Augmented Generation (RaG) for Real-Time NLP Enhancements

How I Built an AI-Powered Edge Computing Application with Python

Getting Started with AgentOps: A Quick Setup Guide

Mistral AI Unveils Ministral 3B and 8B Models

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

DIAYN: Diversity Is All You Need

Author(s): Sherwin Chen

Diving Into DIAYN U+007C Towards AI

An Unsupervised Information-Based Method to Learn Diverse Skills

Introduction

Preliminaries

Diversity Is All You Need

Algorithm

Incorporating DIAYN into Hierarchical Reinforcement Learning

References

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement