A Tour of Conditional Random Field

Last Updated on June 7, 2020 by Editorial Team

Author(s): Kapil Jayesh Pathak

In this article, we’ll explore and go deeper into the Conditional Random Field (CRF). Conditional Random Field is a probabilistic graphical model that has a wide range of applications such as gene prediction, parts of image recognition, etc. It has also been used in natural language processing (NLP) extensively in the area of neural sequence labeling, named entity recognition, Parts-of-Speech tagging, etc. Conditional Random Field has been used when information about neighboring labels are essential while calculating a label for individual sequence item.

A graphical model is a probabilistic model for graphs which uses conditional dependence between random variables. There are two types of graphical models, namely, Bayesian network and Markov Random Fields. Bayesian Networks are mostly directed acyclic graphs, whereas Markov Random Fields are undirected graphs and may be cyclic. Conditional Random Fields come in the latter category.

https://www.forbes.com/sites/bernardmarr/2019/06/03/5-amazing-examples-of-natural-language-processing-nlp-in-practice/#2eeb83

From here, we will dive deep into the mathematics of Conditional Random Field. For that, we need to understand the notion of conditional dependence first. If a variable y is conditionally dependent on variable x, then given input x, we can identify its class by the following expression:

Assuming that this is a problem of structured prediction, i.e., we need to identify the sequence of labels for sequential inputs, we can leverage the notion of conditional independence among different elements xi by the following expression.

Here ϕ(.) is an activation function, k is a sequence length, and Z(xi) is a partition function. After that, P(y|x) can be written as follows:

The above expression gives us an expression of P(y|x) when we use greedy decoding. In the case of Conditional Random Field, we need information about neighboring labels. This information is incorporated into the expression of P(y|x) with transition table V. In another variant of CRF, a context window on inputs x{i} is used to calculate along with labels information as well. For example, if there is a context window of 3, the expression for P(y|x) is given as follows:

Let’s consider the following notation for both unary-log terms and pairwise transition terms, respectively.

The expression of P(y|x) can be written as follows:

Inference

As we know, Z(X) is a partition function. If we calculate the partition function in a naively we obtain an expression for the partition function as follows:

In the above expression, as we can see, we are doing K sums every time we calculate the partition function. The computational complexity for the above calculation is of the order O(C^K), which is not scalable. To reduce complexity, we need to arrange Z(X) in a slightly different way.

The advantage of such rearrangement of terms becomes visible in Algorithm 1 given below. We introduce another class of vector-valued functions α{i}(.), which is initialized by performing the innermost sum over all possible labels at y{1}’. Such rearrangement of terms allows us to leverage the advantages of dynamic programming to reduce the complexity of the task. A more detailed algorithm is given below.

The computational complexity of the above algorithm is O(KC²), where C is the number of labels for each position.

Again, the sum for Z(X) is rearranged such that the innermost sum is performed over the Kth label of the sequence given and traversed from the Kth label to the first label of the sequence.

This time, we introduce another vector-valued function β(.). We initialize β function by summing over all possible labels at the Kth position and iterate till labels at 2nd position.

Computing the partition function from algorithm 1 (or 2) is referred to as the Forward-Backward algorithm for CRF. Computing α function is termed as a forward pass while computing β function is termed as a backward pass. The above processes together are also called belief propagation.

In this article, we first got an overview of graphical models. Then we discussed what is a conditional random field and its applications — its another variant based on the context. While making an inference, we saw the need for a forward-backward algorithm and its time complexity. Apart from these topics, we also should look into the loss function of CRF and how a back-propagation performed while training. These topics can be covered during another discussion.

References and Further Reading:

Youtube lecture series of Hugo Larochelle
http://www.cs.cmu.edu/~10715-f18/lectures/lecture2-crf.pdf
http://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (Paper)

A Tour of Conditional Random Field was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

A Tour of Conditional Random Field

Author(s): Kapil Jayesh Pathak

Inference

Towards AI Team

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

A Tour of Conditional Random Field

Author(s): Kapil Jayesh Pathak

Inference

Towards AI Team

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥