Training the Same Neural Network with Different Optimizers
Author(s): Gradient Thoughts Originally published on Towards AI. Source: Image by Conny Schneider on Unsplash Optimizers are often discussed under a simplistic, surface level lens: adaptive methods like Adam are said to converge faster, while SGD is believed to generalize relatively better …