Spotting Controversy with NLP

Last Updated on July 20, 2023 by Editorial Team

Author(s): Jo Stichbury

Originally published on Towards AI.

Applying BERT to analyze ESG topics in financial services

In this article, I’ll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.

Spotting Controversy with NLP — Tango with Pollock by Pitel on DeviantArt (CC BY-NC-SA 3.0)

Environmental, social, and governance (ESG) metrics measure the sustainability and societal impact of an investment in a company or business. Before committing to a company, investors want to know if there are any potential controversies brewing, or if the company shows particular leadership in an area of ESG, such as diversity in the workforce.

Refinitiv is a global provider of financial market data and infrastructure, and this article describes how their Labs team is exploring the use of NLP to give their clients a competitive edge in global financial markets. Currently, Refinitiv analysts search for news stories about a specific company using a set of ESG-related keywords, and if there’s a positive match, the story is subject to further scrutiny. For example, a keyword scan would identify a potential governance controversy in the following snippet, prompting an analyst to read the story and determine whether it indicates an ESG controversy:

CHICAGO (Reuters) — The agricultural unit of German chemicals company Bayer AG will halt future U.S. sales of an insecticide that can be used on more than 200 crops after losing a fight with the U.S. Environment Protection Agency, the company said on Friday.

This can take the analyst some considerable time. As Tim Nugent, Senior Research Scientist at Refinitiv Labs, explains “the problem we need to solve is that it’s time-consuming to search and read news articles”. One option to increase throughput and coverage is to hire more analysts to cover more stories, but why not optimize the process with AI to build a more efficient workflow?

Alphabet Letters by Sarah on Flickr (CC BY-NC 2.0)

Introducing BERT

Using machine learning and natural language processing (NLP), Tim Nugent’s team has trained a model to review a news stream and triage news stories for potential ESG controversies.

When the Refinitiv analysts review an article manually they look for controversies in 20 ESG topics defined in-house, many of which align with the UN sustainable development goals. For a specific company, by examining each of the ESG topics, the analysts decide whether the article suggests controversy or not for that topic. In essence, they perform document classification — something which can be re-framed as a supervised machine learning task. An algorithm can be trained to make the same decision and output a probability score for each of the ESG controversial topics. Where the probability sits above a confidence threshold it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.

The Refinitiv Labs team uses Google’s open-source NLP model, BERT, which has demonstrated state-of-the-art performance in a range of classification tasks. BERT is pre-trained on 3.3 billion words from a general domain corpus, such as Wikipedia and the open BookCorpus dataset, so has a good, native understanding of the English language. The team further trained BERT using a business and finance-specific corpus. They used the Reuters News Archive, a further 715 million words from about 2 million articles. The extra training gives the model a better understanding of the domain-specific terminology of business and financial news and improves its prediction confidence downstream. Once this step was complete, they “fine-tuned” the domain-specific model to deal with the ESG controversy classification task.

“The field is highly adversarial and giving customers an edge can be profoundly impactful,” says Tim Nugent. BERT is a state-of-the-art model for language processing, but pre-training the model with additional data from Reuters News, has made it smarter still. BERT-RNA, as Nugent styles the adapted model, shows improvements in confidence from generic BERT (82% vs 78%) because of its adaptation for the nuances of financially focussed language. While 4% may not appear on the surface to be significant, it has the potential to translate to a huge competitive advantage.

High-quality data is crucial for supervised machine learning tasks. The ESG controversy model, trained using approximately 30,000 “positive” articles that Refinitiv analysts had already annotated, was crucial and used alongside a corresponding set of negative examples. Further work will focus on training the model with additional sources of ESG data that are typically less structured than the traditional market and index data, such as a company’s self-reported data.

In conclusion

The Refinitiv Labs team has used machine learning and NLP to positive effect, allowing the company’s ESG analysts to be more productive and efficient. The BERT-RNA model allows human expertise and domain-specific knowledge to work alongside each other. Analysts can now focus on what they know best — they can offer their company’s client-base insightful information about ESG controversies surrounding their companies of interest.

A version of this article appeared on Refinitiv Perspectives in March 2020.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Spotting Controversy with NLP

Author(s): Jo Stichbury

Applying BERT to analyze ESG topics in financial services

Introducing BERT

In conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

RNNs Cannot Think What Transformers Think Cheaply. ICLR 2026 Proved the Gap Is Exponential.

Time Series Made So Easy My Aunt Got It on the Second Read

Claude Cowork 101

Is 3-Bit KV Cache the Holy Grail? A Reality Check on Google’s TurboQuant

LangGraph Multi-Agent Architecture: Building a Self-Critiquing AI Debate System

AutoML on Autopilot

I Ran This Open-Source AI Tool on a Messy Codebase and Got 71x Fewer Tokens — Here Is Exactly What Happened

Month in 4 Papers (April 2026)

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Spotting Controversy with NLP

Author(s): Jo Stichbury

Applying BERT to analyze ESG topics in financial services

Introducing BERT

In conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement