The Role of Signal-to-Noise in Loss Convergence
Author(s): Austin DeWolfe Originally published on Towards AI. Source: Image by author Consider the normal NLP training curve during pre-training. That nice beautiful line of healthy training. Why does it behave like that? Why does the image above not behave like that? …