Summarizing News by Abstractive Approach
Author(s): Edward Ma
In NLP, there are two approaches to do the text summarization. The first one, the extractive approach, is a simple approach that is extracting keywords or sentences from an article. There are some limitations and proved that the performance is not very good. This approach suffers from irrelevance and redundancy. The second one, the abstractive approach, is generating new sentences base on a given article. It needs more advanced techniques but achieving better results.
This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content, and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by paraphrasing sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both natural language processing and often a deep understanding of the domain of the original text in cases where the original document relates to a special field of knowledge.
We have lots of use cases on leveraging text summarization in our daily life. One of the valid usages is news summarization. Detailed news may include several paragraphs and over 1000 words. It takes around several mins to read through the whole news. It is hard for people to digest a huge amount of local and international news that covers lots of topics such as financial, sports, etc. Therefore, news summarization assists people to have a high-level understanding of it quickly. Instead of spending 5 mins reading news that may not relevant to ourselves, we may just 30 seconds getting the rough idea from the news summary.
Another usage is finding relevant research papers. The abstract section helps us to get a rough idea of what problem do practitioners want to solve and what is the solution. Otherwise, we may need to read through 10 pages in order to decide whether this paper is relevant.
Can we summarize news through a machine learning model?
Benefit from new technology in NLP, summarizing news is definitely possible. We can leverage the state-of-the-art NLP architecture such as sequence-to-sequence and transformer. Also, you need a dataset that including abstract and detailed news articles. Finally, you need a powerful machine to train a news summarization model.
Another way is leveraging API to get the summarized news. You only need to provide content of news and you can get the abstraction without any machine learning code.
Working hours during the pandemic
Here is the generated summary from my API of this news
in the UK, Austria, Canada and the United States has seen a rise in working hours since the pandemic hit Europe last week. Home-working employees are now more likely to put in more hours than before, according to new research.
Latino-owned businesses growth
Here is another generated summary from my API of this news
the challenges facing Latinos to secure capital from national banks, according to a new study. Latino-owned businesses are growing faster than the national average across several industries, growing 34 percent over the last 10 years compared to just 1 percent for all other small businesses.
I trained a news summarization deep learning model and established a web server to provide this service. Drop me an email or message if you want to try this API service.
Like to learn?
- Summarize document by combing extractive and abstractive steps
- Explantation of extractive way of summarization
Published via Towards AI