Supermasks: A Simple Introduction and Implementation in PyTorch
The general understanding of neural networks is that computations are required in order to adjust the weights of a neural network so that it could perform a certain task on a given dataset. However, it seems like it is not quite true. Apparently, given a randomly initialized dense neural network:

There exist sub-networks that when trained, can achieve a performance as well as the original network after training. Even more surprising, there exist sub-networks that without any training, can perform way better than the random initialization on a certain task. For example, on the MNIST dataset, it can achieve up to 86%

