Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?

Last Updated on April 23, 2025 by Editorial Team

Author(s): S Aishwarya

Originally published on Towards AI.

In today’s digital era, fake news spreads faster than the truth, and the consequences can be serious. From influencing elections to spreading health misinformation, tackling fake news is more important than ever.

Fake news detection might seem like a job best suited for cutting-edge transformer models like BERT, but can traditional models like LSTM still hold their ground?

But here’s the tech question:

Which model is better at detecting fake news — a classic LSTM or a modern transformer like BERT?

In this guide, we’ll compare two approaches for detecting fake news using deep learning:

1.LSTM trained from scratch
2.BERT fine-tuned using HuggingFace

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models? — Photo by Thomas Charters on Unsplash

📦 Dataset: Fake and Real News Dataset

We’ll be using the popular dataset from Kaggle, which contains over 44,000 news articles, split into:

Dataset LINK: fake-and-real-news-dataset

REAL: Legitimate news from verified sources

FAKE: Fabricated news with misleading content

Each entry includes:

Title: title of news article
Text: body text of news article
Subject: subject of news article
Date: publish date of news article

🔧 Approach 1: LSTM Trained from Scratch

📥 Step 1: Import Libraries

Essential packages for data handling, preprocessing, and modeling.

import pandas as pd, numpy as np, re, nltk
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional, Dropout
from tensorflow.keras.callbacks import EarlyStopping

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

📂 Step 2: Load Dataset

Read the fake and real news datasets.

df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")

🏷️ Step 3: Add Labels

Label fake as 0 and real as 1.

df_fake['label'] = 0
df_real['label'] = 1

🔗 Step 4: Combine & Shuffle

Merge both datasets and shuffle.

df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)

🧹 Step 5: Clean the Text

Remove HTML, punctuation, numbers, and stopwords.

def clean_text(text):
 text = str(text).lower()
 text = re.sub(r'<.*?>', '', text)
 text = re.sub(r'[^\w\s]', '', text)
 text = re.sub(r'\d+', '', text)
 text = " ".join([word for word in text.split() if word not in stop_words])
 return text
df['text'] = df['title'] + " " + df['text']
df['text'] = df['text'].apply(clean_text)

🔠 Step 6: Tokenize & Pad

Convert text to sequences and pad them.

tokenizer = Tokenizer(num_words=50000, oov_token="<OOV>")
tokenizer.fit_on_texts(df['text'])
sequences = tokenizer.texts_to_sequences(df['text'])
word_index = tokenizer.word_index
vocab_size = len(word_index) + 1
max_length = 500
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post')

🧪 Step 7: Train-Test Split

Split into training and validation datasets.

X = padded_sequences
y = df['label'].values
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

🧠 Step 8: Build the LSTM Model

Create a stacked Bidirectional LSTM model.

model = Sequential([
 Embedding(input_dim=vocab_size, output_dim=128, input_length=max_length),
 Bidirectional(LSTM(128, return_sequences=True)),
 Bidirectional(LSTM(64)),
 Dense(64, activation='relu'),
 Dropout(0.5),
 Dense(1, activation='sigmoid')
])

⚙️ Step 9: Compile and Train

Compile the model and train it with early stopping.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stop = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_val, y_val), callbacks=[early_stop])

📈 Step 10: Visualize Accuracy

Plot training and validation accuracy.

plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title("Training vs Validation Accuracy")
plt.show()

✅ Step 11: Evaluate Model

Final validation accuracy after training.

loss, acc = model.evaluate(X_val, y_val)
print(f"\n✅ Final Validation Accuracy: {acc:.4f}")

Output:

✅ Final Validation Accuracy: 0.9125

🤖 Approach 2: BERT Fine-Tuning with HuggingFace

Why build from scratch when you can fine-tune a powerful pre-trained model?

In this approach, we’ll use bert-base-uncased, a general-purpose language model from HuggingFace, and fine-tune it on the Fake and Real News dataset. BERT understands syntax and context, making it ideal for classification tasks like fake news detection.

What Makes BERT Powerful?

Pre-trained on a huge corpus (Wikipedia + BookCorpus)
Captures contextual relationships between words
Works great with minimal preprocessing

🛠️ Step-by-Step Implementation

📦 1. Install Required Libraries

pip install transformers datasets tensorflow

📥 2. Load and Prepare the Dataset

import pandas as pd
from sklearn.model_selection import train_test_split

df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")

df_fake['label'] = 0
df_real['label'] = 1

df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)

df['text'] = df['title'] + " " + df['text']

train_texts, val_texts, train_labels, val_labels = train_test_split(
 df['text'].tolist(), df['label'].tolist(), test_size=0.2, random_state=42)

✂️ 3. Tokenization with BERT Tokenizer

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)

🧾 4. Prepare TensorFlow Datasets

import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
 dict(train_encodings),
 train_labels
)).shuffle(1000).batch(16)
val_dataset = tf.data.Dataset.from_tensor_slices((
 dict(val_encodings),
 val_labels
)).batch(16)

🧠 5. Load and Fine-Tune BERT

from transformers import TFBertForSequenceClassification, AdamW
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model.compile(optimizer=AdamW(learning_rate=2e-5),
 loss='sparse_categorical_crossentropy',
 metrics=['accuracy'])
model.fit(train_dataset, validation_data=val_dataset, epochs=3)

✅ 6. Evaluate Model Performance

loss, accuracy = model.evaluate(val_dataset)
print(f"\n✅ Final Validation Accuracy: {accuracy:.4f}")

Output:

✅ Final Validation Accuracy: 0.9314

📊 Comparative Analysis: LSTM vs BERT

Now that we’ve implemented and evaluated both models, let’s compare them side by side:

Conclusion: BERT edges out LSTM in accuracy due to its deep understanding of context. However, if you’re short on compute or want a simpler model, LSTM still provides excellent results.

⚖️ When to Use LSTM vs BERT

Choosing between LSTM and BERT depends on your goals and resources:

Use LSTM if:

You’re constrained on computational resources
You want to build models from scratch for educational purposes
Dataset is relatively small and domain-specific

Use BERT if:

You need state-of-the-art accuracy
You’re working with large or noisy text data
You want to leverage transfer learning for better generalization

💡 Final Thoughts

Both LSTM and BERT are powerful in their own right. While BERT dominates with context-awareness and pre-training, LSTMs still remain relevant for faster deployment and simpler pipelines. In the battle against fake news, picking the right model is just one part — the key is using technology ethically and effectively to promote truth.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?

Author(s): S Aishwarya

📦 Dataset: Fake and Real News Dataset

🔧 Approach 1: LSTM Trained from Scratch

🤖 Approach 2: BERT Fine-Tuning with HuggingFace

What Makes BERT Powerful?

🛠️ Step-by-Step Implementation

📊 Comparative Analysis: LSTM vs BERT

⚖️ When to Use LSTM vs BERT

💡 Final Thoughts

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?

Author(s): S Aishwarya

📦 Dataset: Fake and Real News Dataset

🔧 Approach 1: LSTM Trained from Scratch

🤖 Approach 2: BERT Fine-Tuning with HuggingFace

What Makes BERT Powerful?

🛠️ Step-by-Step Implementation

📊 Comparative Analysis: LSTM vs BERT

⚖️ When to Use LSTM vs BERT

💡 Final Thoughts

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement