Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Some Insights into Recent Development and Novel Approaches to Short Text Classification.
Latest   Machine Learning

Some Insights into Recent Development and Novel Approaches to Short Text Classification.

Last Updated on June 14, 2023 by Editorial Team

Author(s): Abdul Basit

Originally published on Towards AI.

Short text classification is one of the most challenging areas in Natural language processing. It poses some unique challenges as compared to text classification because in standard text classification, the context is available, but in the short text, it becomes very hard to predict it. Researchers are working and coming up with new ideas and approaches to classify the short text.

The aim of this report is to give insight into some of the recent approaches and some of the previous ones. It is a survey on related methods of short text classification. One of the well-known surveys by (Ge Song, et al. MAY 2014) analyzes the features and difficulties of short text classification and summarizes the existing related methods. This survey was almost a decade ago, and there are no recent studies on short text classification. This is a step to fill some of the gaps. it consists of some of the selected papers from 2014 to 2019.

(Bouaziz, et al. 2014) made the first step towards the development of new algorithms for better accuracy. Before that, a lot of focus was on the preprocessing of the text. The author proposed a new method at that time in which short text enriches in two different ways, first a text as a set of words taken separately and second one entity. Once the text enriches, the semantic random forest is applied, which reduces the random feature selection in favor of semantic-driven selection. Conventionally in Random Forest, all feature of the corpus is used in the building trees, but in the purposed methods, only semantic features are selected. The algorithm computes the similarities between all the topics and the text. By applying this method, the authors constructed a tree consisting of nodes belonging to the same topic.

(Lee and Dernoncourt 2016) purposed to develop the model for short text classification. The model has two parts. The first part creates a vector representation for each short text using either RNN or CNN architectures. The second part classifies the vector representation through the SoftMax classifier.

(Wangy, et al. 2017) purpose the novel method, which combines the explicit and implicit representation of short text using a large taxonomy knowledge base and coalescing words. The network is called Knowledge Powered Convolutional Neural Network. It is a combination of two subnetworks to extract word concept features. The first network extracts feature from both words and relevant concepts. The second network is a character-level convolutional neural network that captures fine-grained semantic information. Finally, both outputs of the subnetworks concatenate for the fully connected output layer.

To resolve the issue of data sparsity in short text classification (Zeng, et al. 2018)purpose a novel approach by utilizing a topic memory to encode latent topic representation indicative of class labels. The purposed method, called Topic Memory Networks, Takes advantage of corpus-level topic representation by employing a topic memory mechanism to enhance short text classification. The model architecture consists of three main components: A text encoder, a topic memory network, and a classifier. The text encoder encodes the short text into a continuous vector representation (Zeng, et al. 2018) and uses the CNN as a text encoder. The topic memory is designed to encode latent topic representations via memory networks for short text classification. They propose a novel mechanism that allows joint inference of latent topics and alleviates data sparsity issues. It consists of two parts: a topic encoder and a topic memory. Lastly, the classifier takes the encoded text and latent topics as inputs and predicts the class labels of the short input text.

(Xu and Cai† 2019) authors purpose a neural network model that incorporates the context-relevant knowledge into a CNN for short text classification. The model consists of two parts, One part, which is the lower sub-network, is responsible for extracting the concept feature, and the other part extracts the context feature, it is called the upper subnetwork. The lower sub-network layer consists of two layers. The first layer is the convolutional layer which extracts the concept feature from the input text. The second layer is the attention layer which gets the context-relevant concepts by attending to the most relevant part of the input text. The attention mechanism gives the weights to different parts of the input text based on their relevance to the task. The upper sub-network combines word embeddings and context–relevant concept embeddings together called CCWE and feeds them to CNN. The network is able to capture both word-level and context-level information allowing the model to better capture the meaning of the short text.

(Chen et al., 2019), purpose a novel approach for short text classification using external knowledge sources to enhance semantic representation. The architecture of STCKA consists of three main layers. The first layer is the word embeddings layer, which maps each word of the given text in vector representation. The second layer is the most important layer, and in this layer, (Chen et al. 2019) introduce two attention mechanisms, Concept Toward Short text (C-ST) and Concept Toward Concept Set (C-CS). The final layer is the classification layer. The input to this layer is the output of the knowledge-enhanced attention layer and predicts the class label of the given input text.

A new approach was developed by (Linmei et al. 2019) for semi-supervised short-text classification based on a heterogenous graph neural network. (Lee and Dernoncourt 2016) tackle the task of attaining the level of performance on short text with data scarcity of limited labeled data and exploiting the limited set of labeled data and large unlabeled data through information propagation along the graph. The method contains two steps. First, to alleviate the sparsity of short texts, the author presents a flexible HIN (Heterogeneous Information Network) framework for processing the short texts. This framework can include any extra information as well as capture the semantics between the short text and the added information. HIN consists of multiple type nodes and edges which possess different types of entities and relations in the network.

Second, (Linmei et al. 2019) developed a novel model HGAT(Heterogeneous Graph Attention Network), to embed the HIN for short text Classification based on a new dual Level attention mechanism. The model considers the diversity of different types of information by utilizing the node-level attention and type-level attention mechanisms. Node-level attention is to extract important features from each node in the graph. Type-level attention is used to extract the feature from each type of node in the graph. The HGAT model has several other layers also, and the output layer uses the SoftMax activation to predict the class probabilities for each input text.

In this report, a lot of research has been discussed, and each of them addresses the challenges of short text classification through its unique approaches. A lot of work was also done after 2019. to give a summary of those approaches that work is left for future work.


Bouaziz, Ameni, Christel Dartigues-Pallez, Precioso, Patrick Lloret, and Pereira. 2014. “Short Text Classification Using Semantic Random Forest.” 288–289.

Chen, Jindong, Yizhou Hu, Jingping Liu, Yanghua Xiao, and Haiyun Jiang. 2019. “Deep Short Text Classification with Knowledge Powered Attention.” The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

Ge Song, Yunming Ye, Xiaolin Du, Xiaohui Huang, and Shifu Bie. MAY 2014. “Short Text Classification: A Survey.” JOURNAL OF MULTIMEDIA, VOL. 9, NO. 5, 635–636.

Lee, Ji Young, and Franck Dernoncourt. 2016. “Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks.” NAACL. arXiv.

Linmei, Hu, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. “Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). :: Association for Computational Linguistics. 4821–4830.

Wangy, Jin, Zhongyuan Wangz, Dawei Zhang, and Jun Yan. 2017. “Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification.” Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17).

Xu, Jingyun, and Yi Cai†. 2019. “Incorporating Context-Relevant Knowledge into Convolutional Neural Networks for Short Text Classification.” The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

Zeng, Jichuan, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2018. “Topic Memory Networks for Short Text Classification.” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. 3120–3131.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓