NLP-Aided Systematic Literature Review: Why It’s Needed and How It Works

Last Updated on March 1, 2021 by Editorial Team

Answering a specific, clearly defined research question from the massive amount of healthcare-related scholarly and clinical literature in existence can be extremely challenging. Yet that’s precisely the goal of systematic literature reviews (SLRs) in health care, which use a systematic approach to critically appraise and evaluate vast amounts of quantitative and qualitative data about a specific health-related issue.

SLRs provide an exhaustive summary — especially compared to other review types, such as rapid reviews — of all the evidence available on a specific research question, to make this evidence more readily available to key decision-makers. To have the most value SLRs must be performed extremely rigorously. The U.S. Institute of Medicine (IOM) has devised 21 standards meant to guide the development of high-quality SLRs in health care.

Because of the considerable rigor involved, SLRs are considered the highest level of evidence possible and play a vital role in health-care decision-making. They’re also a key component of the practice of evidence-based medicine (EBM), an interdisciplinary process linking research evidence with clinical expertise and patient values. The practice of EBM also includes the use of risk-benefit analysis, meta-analysis, and randomized controlled trials (RCT). One essential task when doing SLRs is the mining of large research databases to identify RCTs from the millions of other documents on hand (just 1.6 percent of 26.6M articles in PubMed in 2016, for example, were RCTs).

But the amount of time and work required to produce high-quality SLRs can be daunting. And that’s where automation using AI techniques like natural language processing (NLP) can make a huge difference.

The need for SLR automation using NLP

SLRs traditionally take a very long time to develop and require several specialized team members to devote a significant number of hours: According to this 2018 study by Bullers et. al., a mean total of 1,139 hours per project. Even the research question’s development can be time-consuming: Many experts recommend using the PICO (problem, intervention, comparison, outcome) tool to improve this process.

The University of Toronto says SLR teams should include the following capabilities and specialized skills:

Subject matter experts with clinical/methodological expertise
Two independent reviewers
An information specialist/medical librarian trained in SLR methods
A statistician (if including meta-analysis)
A tie-breaker for resolving contentious decisions

Despite the large number of team members needed, SLRs still take considerable time. Canada’s Western University estimates the required time needed to complete an SLR at six months to 1.5 years; U of T says teams of multiple subject-matter experts should plan on “at least” nine to 12 months depending on the topic. For primary study publications aspiring for inclusion in an SLR, the process is even more glacial: Most don’t get included in an SLR for an average of 2.5 to 6.5 years.

This sluggish process of SLR development has significant implications for accuracy and relevance: Twenty-three percent of all SLRs are considered out of date within two years of publication due to new evidence or findings.

But there are good reasons why they take so long. SLRs involve several different time-consuming tasks, including search strategy development, search strategy translation, documentation, and search methodology writing. The main steps of a rigorous SLR in health care include:

Formulating a specific health-care research question
Developing a protocol
Conducting a search
Selecting and assessing research studies
Extracting the relevant data and then analyzing, summarizing, and synthesizing that data (often the most time-consuming step)
Interpreting the results

Because of the immense amount of manual effort involved, SLRs are notoriously difficult to scale, even when using systematic review software to help manage the process along with teams of experts.

How does NLP solve these issues?

NLP (including text mining) is a type of AI that uses computers to understand unstructured data such as written language. NLP can read and understand this text, extracting targeted information used to automate SLR tasks — helping speed up several elements of the process, including information extraction, exponentially. One study from 2016 using a support vector machine classifier realized high accuracy and reviewers only had to read 3.7 sentences (on average) per document instead of the entire document.

Because NLP algorithms are a field of machine learning they learn as they process more and more relevant data, becoming more and more adept at their tasks as additional corpora and training data are processed.

Information extraction using NLP includes concept extraction (aka named entity recognition) and relation extraction (also known as association extraction). Jonnalagadda et. al. say these techniques “have been used to automate the extraction of genomic and clinical information from biomedical literature.” The researchers add that automating data extraction in SLRs can “substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice.”

Critical NLP tasks in health-care SLR development

Two NLP capabilities are especially suited to the SLR process: Data extraction, which we already mentioned, and text classification.

Automated text classification is useful because it can read the content of documents and classify them based on specific predefined parameters — determining whether a particular document is an RCT, for example, saving hours of manual work. Text classification primarily involves two main tasks: a) Identifying key sentences and ignoring irrelevant passages, b) Classifying these sentences or paragraphs and tagging them based on predetermined categories or criteria
Data extraction, meanwhile, identifies pieces of text or numbers (such as the findings of a particular report, or number of subjects of a clinical trial) based on variables of interest and extracts the information from the source file.

Marshall et. al. point out that the most prominent type of text classification employed in the review process is that of abstract screening, which determines whether articles meet the review’s inclusion criteria. Machine learning algorithms can also be trained to use abstract screening to rank documents based on relevance — potentially saving reviewers dozens of hours.

NLP models used in health-care SLRs

A handful of pre-trained NLP models are especially well-suited for scientific text and use in the development of health-care SLRs:

SciBERT is a pre-trained language model based on Bidirectional Encoder Representations from Transformers (BERT), fine-tuned for medical applications with 1.14M randomly-selected Semantic Scholar papers.
BioBERT performs biomedical text mining based on a pre-trained biomedical language representation model. It’s trained and fine-tuned with many sources including English Wikipedia, BooksCorpus, PubMed Abstracts, and PMC full-text articles. Further fine-tuning of BioBERT uses biomedical named entity recognition datasets such as NCBI Disease (2014) and BC4CHEMD (2015).
ClinicalBERT is another language model based on BERT and focused on health care. It evaluates representations of clinical notes but is mostly used in the clinical domain.

Using NLP for health-care SLRs isn’t without its challenges, of course, not least being the complexity of the English (or any other) language. Some words and statements can be incredibly nuanced, while others can have multiple meanings depending on context. Some colloquial expressions have meanings utterly different from their literal equivalent. Even grammar can be wildly inconsistent to evaluate, depending on the writer and their level of familiarity with the language.

It all adds up to a dizzying amount of possible phrases, words, and combinations that any NLP algorithm must evaluate at breakneck speeds. But, CapeStart’s expert machine learning engineers, subject matter experts, and data scientists can help through a combination of data annotation, custom machine learning model development, and software development. Our healthcare-focused NLP and data annotation solutions are used by some of the world’s most innovative medical companies in a range of applications, including medical text classification, named entity recognition, text analysis, and topic modeling. CapeStart also offers pre-built models suitable for complex SLR problems.

NLP-Aided Systematic Literature Review: Why It’s Needed and How It Works was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication