Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?
Last Updated on April 23, 2025 by Editorial Team
Author(s): S Aishwarya
Originally published on Towards AI.
In today’s digital era, fake news spreads faster than the truth, and the consequences can be serious. From influencing elections to spreading health misinformation, tackling fake news is more important than ever.
Fake news detection might seem like a job best suited for cutting-edge transformer models like BERT, but can traditional models like LSTM still hold their ground?
But here’s the tech question:
Which model is better at detecting fake news — a classic LSTM or a modern transformer like BERT?
In this guide, we’ll compare two approaches for detecting fake news using deep learning:
1.LSTM trained from scratch
2.BERT fine-tuned using HuggingFace
📦 Dataset: Fake and Real News Dataset
We’ll be using the popular dataset from Kaggle, which contains over 44,000 news articles, split into:
Dataset LINK: fake-and-real-news-dataset
REAL: Legitimate news from verified sources
FAKE: Fabricated news with misleading content
Each entry includes:
- Title: title of news article
- Text: body text of news article
- Subject: subject of news article
- Date: publish date of news article
🔧 Approach 1: LSTM Trained from Scratch
📥 Step 1: Import Libraries
Essential packages for data handling, preprocessing, and modeling.
import pandas as pd, numpy as np, re, nltk
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional, Dropout
from tensorflow.keras.callbacks import EarlyStopping
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
📂 Step 2: Load Dataset
Read the fake and real news datasets.
df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")
🏷️ Step 3: Add Labels
Label fake as 0 and real as 1.
df_fake['label'] = 0
df_real['label'] = 1
🔗 Step 4: Combine & Shuffle
Merge both datasets and shuffle.
df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)
🧹 Step 5: Clean the Text
Remove HTML, punctuation, numbers, and stopwords.
def clean_text(text):
text = str(text).lower()
text = re.sub(r'<.*?>', '', text)
text = re.sub(r'[^\w\s]', '', text)
text = re.sub(r'\d+', '', text)
text = " ".join([word for word in text.split() if word not in stop_words])
return text
df['text'] = df['title'] + " " + df['text']
df['text'] = df['text'].apply(clean_text)
🔠 Step 6: Tokenize & Pad
Convert text to sequences and pad them.
tokenizer = Tokenizer(num_words=50000, oov_token="<OOV>")
tokenizer.fit_on_texts(df['text'])
sequences = tokenizer.texts_to_sequences(df['text'])
word_index = tokenizer.word_index
vocab_size = len(word_index) + 1
max_length = 500
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post')
🧪 Step 7: Train-Test Split
Split into training and validation datasets.
X = padded_sequences
y = df['label'].values
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
🧠 Step 8: Build the LSTM Model
Create a stacked Bidirectional LSTM model.
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=128, input_length=max_length),
Bidirectional(LSTM(128, return_sequences=True)),
Bidirectional(LSTM(64)),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])
⚙️ Step 9: Compile and Train
Compile the model and train it with early stopping.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stop = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_val, y_val), callbacks=[early_stop])
📈 Step 10: Visualize Accuracy
Plot training and validation accuracy.
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title("Training vs Validation Accuracy")
plt.show()
✅ Step 11: Evaluate Model
Final validation accuracy after training.
loss, acc = model.evaluate(X_val, y_val)
print(f"\n✅ Final Validation Accuracy: {acc:.4f}")
Output:
✅ Final Validation Accuracy: 0.9125
🤖 Approach 2: BERT Fine-Tuning with HuggingFace
Why build from scratch when you can fine-tune a powerful pre-trained model?
In this approach, we’ll use bert-base-uncased, a general-purpose language model from HuggingFace, and fine-tune it on the Fake and Real News dataset. BERT understands syntax and context, making it ideal for classification tasks like fake news detection.
What Makes BERT Powerful?
- Pre-trained on a huge corpus (Wikipedia + BookCorpus)
- Captures contextual relationships between words
- Works great with minimal preprocessing
🛠️ Step-by-Step Implementation
📦 1. Install Required Libraries
pip install transformers datasets tensorflow
📥 2. Load and Prepare the Dataset
import pandas as pd
from sklearn.model_selection import train_test_split
df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")
df_fake['label'] = 0
df_real['label'] = 1
df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)
df['text'] = df['title'] + " " + df['text']
train_texts, val_texts, train_labels, val_labels = train_test_split(
df['text'].tolist(), df['label'].tolist(), test_size=0.2, random_state=42)
✂️ 3. Tokenization with BERT Tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)
🧾 4. Prepare TensorFlow Datasets
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
)).shuffle(1000).batch(16)
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
)).batch(16)
🧠 5. Load and Fine-Tune BERT
from transformers import TFBertForSequenceClassification, AdamW
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model.compile(optimizer=AdamW(learning_rate=2e-5),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_dataset, validation_data=val_dataset, epochs=3)
✅ 6. Evaluate Model Performance
loss, accuracy = model.evaluate(val_dataset)
print(f"\n✅ Final Validation Accuracy: {accuracy:.4f}")
Output:
✅ Final Validation Accuracy: 0.9314
📊 Comparative Analysis: LSTM vs BERT
Now that we’ve implemented and evaluated both models, let’s compare them side by side:

Conclusion: BERT edges out LSTM in accuracy due to its deep understanding of context. However, if you’re short on compute or want a simpler model, LSTM still provides excellent results.
⚖️ When to Use LSTM vs BERT
Choosing between LSTM and BERT depends on your goals and resources:
Use LSTM if:
- You’re constrained on computational resources
- You want to build models from scratch for educational purposes
- Dataset is relatively small and domain-specific
Use BERT if:
- You need state-of-the-art accuracy
- You’re working with large or noisy text data
- You want to leverage transfer learning for better generalization
💡 Final Thoughts
Both LSTM and BERT are powerful in their own right. While BERT dominates with context-awareness and pre-training, LSTMs still remain relevant for faster deployment and simpler pipelines. In the battle against fake news, picking the right model is just one part — the key is using technology ethically and effectively to promote truth.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.