Jargons in Deep Learning Explained
Last Updated on June 20, 2022 by Editorial Team
Author(s): Subhash Das
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
12 important key concept definitions related to Artificial Neural Networks and DeepΒ Learning
Few Things toΒ note
Deep learning has racked up an impressive collection of accomplishments in the past several years. In light of this, itβs important to keep a few things in mind, at least in myΒ opinion:
- Deep learning is not a panaceaβββit is not an easy one-size-fits-all solution to every problem outΒ there
- It is not the fabled master algorithmβββdeep learning will not displace all other machine learning algorithms and data science techniques, or, at the very least, it has not yet provenΒ so
- Tempered expectations are necessaryβββwhile great strides have recently been made in all types of classification problems, notably computer vision and natural language processing, as well as reinforcement learning and other areas, contemporary deep learning does not scale to working on very complex problems such as βsolve worldΒ peaceβ
- Deep learning and artificial intelligence are not synonymous
- Deep learning can provide an awful lot to data science in the form of additional processes and tools to help solve problems, and when observed in that light, deep learning is a very valuable addition to the data science landscape
Letβs get started with deep learning-related terminology definitions:
1. DeepΒ Learning
- Deep learning is the process of applying deep neural network technologies to solve problems. Deep neural networks are neural networks with one hidden layerΒ minimum
- Like data mining, deep learning refers to a process, which employs deep neural network architectures, which are particular types of machine learning algorithms.
2. Artificial NeuralΒ Networks
- The machine learning architecture was originally inspired by the biological brain (particularly the neuron) by which deep learning is carried out. Actually, artificial neural networks (ANNs) alone (the non-deep variety) have been around for a very long time, and have been able to solve certain types of problems historically.
- However, comparatively recently, neural network architectures were devised which included layers of hidden neurons (beyond simply the input and output layers), and this added level of complexity is what enables deep learning, and provides a more powerful set of problem-solving tools.
- ANNs vary in their architectures quite considerably, and therefore there is no definitive neural network definition. The 2 generally-cited characteristics of all ANNs are the possession of adaptive weight sets and the capability of approximating non-linear functions of the inputs toΒ neurons.
3. Perceptron
- A perceptron is a simple linear binary classifier. Perceptrons take inputs and associated weights (representing relative input importance) and combine them to produce an output, which is then used for classification.
- Perceptrons have been around a long time, with early implementations dating back to the 1950s, the first of which were involved in early ANN implementations.
4. Multilayer Perceptron
- A multilayer perceptron (MLP) is the implementation of several fully adjacently-connected layers of perceptrons, forming a simple feedforward neuralΒ network
- This multilayer perceptron has the additional benefit of nonlinear activation functions, which single perceptrons do notΒ possess.
5. Feedforward NeuralΒ Network
- Feedforward neural networks are the simplest form of neural network architecture, in which connections are non-cyclical.
- In the original artificial neural network, information in a feedforward network advances in a single direction from the input nodes, though any hidden layers, to the output nodes; no cycles areΒ present.
- Feedforward networks differ from later, recurrent network architectures in which connections form a directedΒ cycle.
6. Recurrent NeuralΒ Network
- In contrast to the above feedforward neural networks, the connections of recurrent neural networks form a directedΒ cycle.
- This bidirectional flow allows for internal temporal state representation, which, in turn, allows sequence processing, and, of note, provides the necessary capabilities for recognizing speech and handwriting.
7. Activation Function
- In neural networks, the activation function produces the output decision boundaries by combining the networkβs weightedΒ inputs.
- Activation functions range from identity (linear) to sigmoid (logistic, or soft step) to hyperbolic (tangent) and beyond. To employ backpropagation (see below), the network must utilize activation functions that are differentiable.
8. Backpropagation
- The back prop is just gradient descent on individual errors. You compare the predictions of the neural network with the desired output and then compute the gradient of the errors concerning the weights of the neural network. This gives you a direction in the parameter weight space in which the error would becomeΒ smaller.
9. CostΒ Function
- The cost function measures the difference between actual and training outputs. A cost of zero between the actual and expected outputs would signify that the network has been training as would be possible; this would clearly beΒ ideal.
10. GradientΒ Descent
- Gradient descent is an optimization algorithm used for finding the local minima of functions. While it does not guarantee a global minimum, gradient descent is especially useful for functions that are difficult to solve analytically for precise solutions, such as setting derivatives to zero andΒ solving.
- In the context of neural networks, stochastic gradient descent is used to make informed adjustments to your networkβs parameters with the goal of minimizing the cost function, thus bringing your networkβs actual outputs closer and closer, iteratively, to the expected outputs during the course of training. This iterative minimization employs calculus, namely differentiation.
- After a training step, the network weights receive updates according to the gradient of the cost function and the networkβs current weights, so that the next training stepβs results may be a little closer to correct (as measured by a smaller cost function). Backpropagation (backward propagation of errors) is the method used to dole these updates out to theΒ network
11. Vanishing GradientΒ Problem
- Backpropagation uses the chain rule to compute gradients (by differentiation), in that layers toward the βfrontβ (input) of an n-layer neural network would have their small number of updated gradient values multiplied n times before having this settled value used as anΒ update.
- This means that the gradient would decrease exponentially, a problem with larger values of n, and front layers would take increasingly more time to train effectively.
12. Long Short Term MemoryΒ Network
- A Long Short Term Memory Network (LSTM) is a recurrent neural network that is optimized for learning from and acting upon time-related data which may have undefined or unknown lengths of time between events of relevance.
- Their particular architecture allows for persistence, giving the ANN a βmemory.β Recent breakthroughs in handwriting recognition and automatic speech recognition have benefited from LSTM networks.
Thanks for reading the articleππ
If you want to read more articles like this, followΒ me
Jargons in Deep Learning Explained was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI