Spotting Controversy with NLP
Last Updated on July 20, 2023 by Editorial Team
Author(s): Jo Stichbury
Originally published on Towards AI.
Applying BERT to analyze ESG topics in financial services
In this article, Iβll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.
Environmental, social, and governance (ESG) metrics measure the sustainability and societal impact of an investment in a company or business. Before committing to a company, investors want to know if there are any potential controversies brewing, or if the company shows particular leadership in an area of ESG, such as diversity in the workforce.
Refinitiv is a global provider of financial market data and infrastructure, and this article describes how their Labs team is exploring the use of NLP to give their clients a competitive edge in global financial markets. Currently, Refinitiv analysts search for news stories about a specific company using a set of ESG-related keywords, and if thereβs a positive match, the story is subject to further scrutiny. For example, a keyword scan would identify a potential governance controversy in the following snippet, prompting an analyst to read the story and determine whether it indicates an ESG controversy:
CHICAGO (Reuters) β The agricultural unit of German chemicals company Bayer AG will halt future U.S. sales of an insecticide that can be used on more than 200 crops after losing a fight with the U.S. Environment Protection Agency, the company said on Friday.
This can take the analyst some considerable time. As Tim Nugent, Senior Research Scientist at Refinitiv Labs, explains βthe problem we need to solve is that itβs time-consuming to search and read news articlesβ. One option to increase throughput and coverage is to hire more analysts to cover more stories, but why not optimize the process with AI to build a more efficient workflow?
Introducing BERT
Using machine learning and natural language processing (NLP), Tim Nugentβs team has trained a model to review a news stream and triage news stories for potential ESG controversies.
When the Refinitiv analysts review an article manually they look for controversies in 20 ESG topics defined in-house, many of which align with the UN sustainable development goals. For a specific company, by examining each of the ESG topics, the analysts decide whether the article suggests controversy or not for that topic. In essence, they perform document classification β something which can be re-framed as a supervised machine learning task. An algorithm can be trained to make the same decision and output a probability score for each of the ESG controversial topics. Where the probability sits above a confidence threshold it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.
The Refinitiv Labs team uses Googleβs open-source NLP model, BERT, which has demonstrated state-of-the-art performance in a range of classification tasks. BERT is pre-trained on 3.3 billion words from a general domain corpus, such as Wikipedia and the open BookCorpus dataset, so has a good, native understanding of the English language. The team further trained BERT using a business and finance-specific corpus. They used the Reuters News Archive, a further 715 million words from about 2 million articles. The extra training gives the model a better understanding of the domain-specific terminology of business and financial news and improves its prediction confidence downstream. Once this step was complete, they βfine-tunedβ the domain-specific model to deal with the ESG controversy classification task.
βThe field is highly adversarial and giving customers an edge can be profoundly impactful,β says Tim Nugent. BERT is a state-of-the-art model for language processing, but pre-training the model with additional data from Reuters News, has made it smarter still. BERT-RNA, as Nugent styles the adapted model, shows improvements in confidence from generic BERT (82% vs 78%) because of its adaptation for the nuances of financially focussed language. While 4% may not appear on the surface to be significant, it has the potential to translate to a huge competitive advantage.
High-quality data is crucial for supervised machine learning tasks. The ESG controversy model, trained using approximately 30,000 βpositiveβ articles that Refinitiv analysts had already annotated, was crucial and used alongside a corresponding set of negative examples. Further work will focus on training the model with additional sources of ESG data that are typically less structured than the traditional market and index data, such as a companyβs self-reported data.
In conclusion
The Refinitiv Labs team has used machine learning and NLP to positive effect, allowing the companyβs ESG analysts to be more productive and efficient. The BERT-RNA model allows human expertise and domain-specific knowledge to work alongside each other. Analysts can now focus on what they know best β they can offer their companyβs client-base insightful information about ESG controversies surrounding their companies of interest.
A version of this article appeared on Refinitiv Perspectives in March 2020.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI