Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

What if Charles Darwin Built a Neural Network?
Latest

What if Charles Darwin Built a Neural Network?

Last Updated on January 16, 2023 by Editorial Team

Last Updated on January 16, 2023 by Editorial Team

Author(s): Alexander Kovalenko

Originally published on Towards AI.

An alternative way to β€œtrain” a neural network with evolution

Photo by Misael Moreno onΒ Unsplash

At the moment, backpropagation is nearly the only way for neural network training. It computes the gradient of the loss function with respect to the weights of the network and consequently updates the weights by backpropagating the error across the network layers using the chainΒ rule.

Recently, the community has been trying to go beyond backpropagation, as maestro Geoffrey Hinton himself, who popularized the algorithm in the '80s, is now β€œdeeply suspicious” about it. His view now is β€œthrow it all away and start again”. There are many reasons behind his suspicion. The main one is that our brain does not backpropagate things, as there is no labeled data. We rather label things by ourselves, which this is a completely different story. Many brilliant works have been published at NeurIPS featuring diverse algorithms, some of which are rather sophisticated. However, we are here not for some hardcore algorithm that looks like an ancient spell to summon the devil but for something simpler. Much simpler. Instead of training our model, we are going to evolve it, just like mother natureΒ did.

Now, it is time for Charles Darwin to come into the picture. Darwin defined evolution as β€œdescent with modification”. Sounds almost like β€œgradient descent”, right? πŸ˜… Natural selection, which is a key point of evolution, means that species evolve over time, give rise to new species, and share a common ancestor. In simple words, the most adapted fraction of the population survives and produces the slightly altered (i.e., mutated) offspring. Then, this process repeats over and over for a long time… How long? For example, in the case of the planet Earth, it took 4.543 billion years. This concept is well-adapted in computer science and is called a genetic or evolutionary algorithm.

Given a large population, starting at random, evolution can emerge with surprisingly good results. Likewise, β€œThe infinite monkey theorem”, states that a monkey, given an infinite amount of time, will almost surely type some meaningful text, say, William Shakespeare’s Hamlet.

β€œLet us suppose that a million monkeys have been trained to type at random, and that these typing monkeys work hard for ten hours a day with a million typewriters of various types. After a year, results of their work would contain exact copies of the books of all kinds and languages held in the richest libraries of theΒ world”

Γ‰mile Borel, 1913, Journal deΒ Physique

Photo by Syed Ahmad onΒ Unsplash

How do we extrapolate this idea to neural network training? Just imagine that we started with a bunch of random nets. They can vary in terms of architecture, weight, or any other parameters. Then we have to check how they do the job, choose a small chunk that performs the best, and do some small modifications (which is called β€œmutation”) to their parameters. Since we are talking about backpropagation alternatives, here, the goal will be to find the best weights and biases for the neural network without a β€œtraditional” way to trainΒ it.

No backpropagation, no gradient descent – no problems!

First, let’s make our lives easier by creating a neural net utilizing the PyTorch library. For simplicity, we’ll take a feed-forward neural network with a single hidden layer and evolve it for a simple task – handwritten digits recognition, which we all know as the MNISTΒ dataset:

Now, to make the model work, we have to initialize weights and biases, thus let’s create a function that takes a model and trainable parameters and puts them all together:

Then, we need the so-called fitness function, which according to good old Wikipedia, is:

a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the setΒ aims.

At first, let’s use the standard PyTorch test function without mini-batches, as we don’t need themΒ anymore:

Therefore, it is the so-called fitness function that we have to maximize. As it returns the accuracy, maximizing fitness function sounds like aΒ plan!

To test the model, we, however, need some input which would be a population of random neural network parameters. Therefore, first, we need to create the population by simply generating many pseudorandom normally distributed tensors of a certain shape, with a little normalization tweak meant to avoid very large parameters, which is simply multiplying the weights and biases byΒ 0.1.

Let’s do it as a list of PythonΒ tuples:

Here, every tuple contains the following:

  • weights of the hiddenΒ layer
  • weights of the outputΒ layer
  • biases of the hiddenΒ layer
  • biases of the outputΒ layer

Now, once we have 100 different solutions, we can feed them to the model, evaluate them, and choose a bunch (let’s say 10) of the best ones. Then we mutate the solutions a bit, which was mentioned above as mutation, and create a new generation, which is simply N new solutions randomly picked from the top of the previous generation and slightlyΒ mutated.

However, here we use a little change to the original implementation of the algorithm. First, as we started with a small population of 100, let us allow the population to grow with every generation by some number. This is just a trick for faster convergence and better results overΒ time.

The second trick is that, once in a while (say 20% of all the cases), we allow random initialization from scratch. Thus, even though we pick the top 10 solutions to create a new generation, there is still a chance for Albert Einstein to be born! Consequently, if they are there, those Einsteins can be picked and further mutated to create a new and better generation. The overall code cycle looks asΒ follows:

To sum up, our slightly improved evolutionary algorithm does the following:

  • randomly generates 100 sets of parameters for a given neural network architecture
  • picks the best 10 ofΒ them
  • randomly slightly mutates the top 10 and creates a new, slightly larger than a previous, population, with a 20% chance of completely random initialization
  • repeats until it takes over theΒ world!

And… voila! Our DarwinNet is ready! When it runs for 200 generations, it gives over 25% accuracy for the MNIST dataset on the test set. Not the best performance, however, a random guess would give 10% accuracy, so it is clearly doing something, as you can see in the figure below. The complete code can be found here. As mentioned above, adding a 20% chance of random initialization helps to improve the performance drastically, which corresponds to the characteristic β€œsteps” on theΒ plot.

This post demonstrates how a genetic algorithm can be used to β€œtrain” the model on one of the benchmark datasets. Bear in mind that the code was written between office sword fights and coffee; there are no big ambitions here. Nevertheless, there are dozens of ways how to attain higher performance by different weight initialization, changing mutation rate, number of parents for each generation, number of generations, etc. Moreover, we can use our prior comprehension of neural networks and inject expert knowledge, i.e., pick the initial weights of the model using Kaiming initialization. This probably won’t help, but you’ve got the point, right?Β πŸ˜ƒ

Of course, the idea of training neural networks using an evolutionary algorithm is not new. There are at least two papers by D. J. Montana & L. Davis and S. J. Marshall & R. F. Harrison from the last century sharing this idea, where they show genetic algorithm superiority over the backdrop. However, whatever happened happened. Nowadays, backpropagation-related attributes are fine-tuned to do their best, thus, their performance is dominant over genetic algorithms and other non-backprop rival methods eg. Hebbian learning or perturbation learning.

Another issue with the present implementation: it takes time… Well, maybe if it ran for over 4 billion years, the result would be better.Β πŸ™ƒ


What if Charles Darwin Built a Neural Network? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓