Understanding Attention In Transformers
Author(s): Shashank Bhushan Originally published on Towards AI. Introduction Transformers are everywhere in machine learning nowadays. What started as a novel architecture for sequence-to-sequence language tasks such as translation, question answering, etc. can now be found in virtually all ML domains, from …
Gradient-Based Learning In Numpy
Author(s): Shashank Bhushan Originally published on Towards AI. Photo by Erol Ahmed on Unsplash What is Gradient-Based Learning? Mathematically training a neural network can be framed as an optimization problem in which we either try to maximize or minimize some function f(x). …