NEAT with Hindsight Experience Replay

Last Updated on August 23, 2023 by Editorial Team

Author(s): Kevin Guo

Originally published on Towards AI.

After implementing NEAT in Unity a few weeks ago, I tried to think of ways to improve its performance.

I remember reading about a novel technique used in reinforcement learning algorithms known as Hindsight Experience Replay. With this technique, failed states that an agent reached are tracked and learned from as if they were successes.

At first, I didn’t think much of it. Due to the speciation method NEAT employs, one population is capable of searching a domain in many different directions. This means that experiences collected by one species searching an area of the domain would not be applicable to genomes outside that species. However, that’s when I had a thought: what if each species maintained a separate list of experiences?

I immediately set out to test my ideas.

Testing Environment

An environment is set up like this:

The agents are spawned on the bottom left and are to reach the star on the top left. The agents can only jump high enough to land on the bottom two platforms.

The agent looks like this.

For input, the agent receives (normalized):

The difference between the position of the goal and the position of the agent

The position of the agent

The y velocity of the agent

Whether or not the agent is on the ground and can jump

The result of 16 raycasts in a circle around the agent, which return the distance to the nearest platform they hit.

For output, the agent gives:

One output for whether to move left (<0.5) or right (>0.5)

One output for whether or not to jump (>0.5)

Configuration

25% of each generation was created with asexual reproduction. The portion of the generation created with crossover has a 35% chance of mutation. The best two genomes in each species remained unchanged and there was a 5% chance of an interspecies crossover occurring. There were 2000 genomes in the initial generation to ensure the population started off with enough members capable of climbing the first platform, but subsequent generations only had 500 genomes. A population is considered as failed if it does not find one agent that reaches the star within 500 generations.

Hindsight Experience Replay took a bit of tweaking to work with NEAT, but I ultimately settled on a strategy that looked like this:

Sample goals/experiences (the last platform the genome was on, to make sure the goals aren’t floating mid-air) from each genome at the end of a generation that fit the following criteria:

The sample is a few jumps above the starting point, so that no samples are taken at the bottom.
The sample is some distance away from every other goal that has been sampled for the species of the given genome.

The goals are added to the species of each genome so that each species maintains a separate list of goals.

When a new generation is created, assign each species two goals from its goal list, in a way such that the goals with the highest success rate are least likely to be assigned. Note that newly sampled goals are considered as having a success rate of 0%. Each generation is then evaluated three times, once for the real goal and twice for the sampled goals.

The fitness is computed with the following two values:

(starting_distance-closest_distance)/starting_distance

to reward the agent for getting close to the goal

(starting_distance-ending_distance)/ending_distance

to reward the agent for staying close to the goal.

If an agent stays on the bottom for too long, it is removed from the simulation and receives a penalty. This is mainly to speed up evaluation.

For NEAT with HER, the reward is cubed to amplify the differences between good and bad solutions. These values can be negative if the agent moves away from the goal.

Each species is given two starting samples (represented by the blue diamonds) to ensure they could be evaluated three times, but in ***hindsight*** this was unnecessary (sorry for the terrible joke). Source: Image by the Author

Results

Standard NEAT with only a distance reward did poorly on this problem, failing six out of ten times. NEAT is extremely reliant on a string of coincidental mutations early on to find the correct path and solve the problem. However, when it fails to do so, it ends up stuck on a local minimum where it climbs the platforms on the left but cannot get further.

Standard NEAT is unable to get any higher

On the other hand, NEAT with HER performs reasonably well. It solves the problem with a median of 149 generations (Stddev: 90) and doesn’t fail once, although it took 415 generations on its worst attempt. Interestingly, NEAT with HER unfolded like a novelty search: as earlier samples became rarer due to the assignment method, agents were forced to explore into new and harder-to-reach areas of the environment. It is important to note that the target goal seemed to have been stumbled upon by chance, although more training might teach the agents how to reach any point in the environment without memorizing a sequence of steps.

Sampled goals promote exploration and decrease the chance of premature convergence.

However, it is important not to overstate this accomplishment. Using a reward function that rewards the agents based on their y value, standard NEAT solves this with a median of 94 generations (Stddev: 36) and less evaluation time but fails once.

Conclusions

NEAT with HER can modestly promote generalization and success on difficult problems without manually messing with a fitness function. However, this comes at a cost of increased evaluation time, and slower convergence when compared to a smartly designed fitness function. Interestingly, it might be possible to train agents this way without an explicit goal at all. Of course, each species still must have at least one sample goal in order to train, but beyond that, the strategy of sampling and assignment promotes exploration of the domain with or without an explicit goal.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

NEAT with Hindsight Experience Replay

Author(s): Kevin Guo

Testing Environment

Configuration

Results

Conclusions

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

NEAT with Hindsight Experience Replay

Author(s): Kevin Guo

Testing Environment

Configuration

Results

Conclusions

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement