Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?
Latest   Machine Learning

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?

Last Updated on April 23, 2025 by Editorial Team

Author(s): S Aishwarya

Originally published on Towards AI.

In today’s digital era, fake news spreads faster than the truth, and the consequences can be serious. From influencing elections to spreading health misinformation, tackling fake news is more important than ever.

Fake news detection might seem like a job best suited for cutting-edge transformer models like BERT, but can traditional models like LSTM still hold their ground?

But here’s the tech question:

Which model is better at detecting fake news — a classic LSTM or a modern transformer like BERT?

In this guide, we’ll compare two approaches for detecting fake news using deep learning:

1.LSTM trained from scratch
2.BERT fine-tuned using HuggingFace

Can Traditional LSTMs Trained From Scratch Compete With Fine-Tuned BERT Models?
Photo by Thomas Charters on Unsplash

📦 Dataset: Fake and Real News Dataset

We’ll be using the popular dataset from Kaggle, which contains over 44,000 news articles, split into:

Dataset LINK: fake-and-real-news-dataset

REAL: Legitimate news from verified sources

FAKE: Fabricated news with misleading content

Each entry includes:

  1. Title: title of news article
  2. Text: body text of news article
  3. Subject: subject of news article
  4. Date: publish date of news article

🔧 Approach 1: LSTM Trained from Scratch

📥 Step 1: Import Libraries

Essential packages for data handling, preprocessing, and modeling.

import pandas as pd, numpy as np, re, nltk
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional, Dropout
from tensorflow.keras.callbacks import EarlyStopping

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

📂 Step 2: Load Dataset

Read the fake and real news datasets.

df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")

🏷️ Step 3: Add Labels

Label fake as 0 and real as 1.

df_fake['label'] = 0
df_real['label'] = 1

🔗 Step 4: Combine & Shuffle

Merge both datasets and shuffle.

df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)

🧹 Step 5: Clean the Text

Remove HTML, punctuation, numbers, and stopwords.

def clean_text(text):
text = str(text).lower()
text = re.sub(r'<.*?>', '', text)
text = re.sub(r'[^\w\s]', '', text)
text = re.sub(r'\d+', '', text)
text = " ".join([word for word in text.split() if word not in stop_words])
return text
df['text'] = df['title'] + " " + df['text']
df['text'] = df['text'].apply(clean_text)

🔠 Step 6: Tokenize & Pad

Convert text to sequences and pad them.

tokenizer = Tokenizer(num_words=50000, oov_token="<OOV>")
tokenizer.fit_on_texts(df['text'])
sequences = tokenizer.texts_to_sequences(df['text'])
word_index = tokenizer.word_index
vocab_size = len(word_index) + 1
max_length = 500
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post')

🧪 Step 7: Train-Test Split

Split into training and validation datasets.

X = padded_sequences
y = df['label'].values
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

🧠 Step 8: Build the LSTM Model

Create a stacked Bidirectional LSTM model.

model = Sequential([
Embedding(input_dim=vocab_size, output_dim=128, input_length=max_length),
Bidirectional(LSTM(128, return_sequences=True)),
Bidirectional(LSTM(64)),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])

⚙️ Step 9: Compile and Train

Compile the model and train it with early stopping.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stop = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_val, y_val), callbacks=[early_stop])

📈 Step 10: Visualize Accuracy

Plot training and validation accuracy.

plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title("Training vs Validation Accuracy")
plt.show()

✅ Step 11: Evaluate Model

Final validation accuracy after training.

loss, acc = model.evaluate(X_val, y_val)
print(f"\n✅ Final Validation Accuracy: {acc:.4f}")

Output:

✅ Final Validation Accuracy: 0.9125

🤖 Approach 2: BERT Fine-Tuning with HuggingFace

Why build from scratch when you can fine-tune a powerful pre-trained model?

In this approach, we’ll use bert-base-uncased, a general-purpose language model from HuggingFace, and fine-tune it on the Fake and Real News dataset. BERT understands syntax and context, making it ideal for classification tasks like fake news detection.

What Makes BERT Powerful?

  • Pre-trained on a huge corpus (Wikipedia + BookCorpus)
  • Captures contextual relationships between words
  • Works great with minimal preprocessing

🛠️ Step-by-Step Implementation

📦 1. Install Required Libraries

pip install transformers datasets tensorflow

📥 2. Load and Prepare the Dataset

import pandas as pd
from sklearn.model_selection import train_test_split

df_fake = pd.read_csv("Fake.csv")
df_real = pd.read_csv("True.csv")

df_fake['label'] = 0
df_real['label'] = 1

df = pd.concat([df_fake, df_real], axis=0).sample(frac=1).reset_index(drop=True)

df['text'] = df['title'] + " " + df['text']

train_texts, val_texts, train_labels, val_labels = train_test_split(
df['text'].tolist(), df['label'].tolist(), test_size=0.2, random_state=42)

✂️ 3. Tokenization with BERT Tokenizer

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)

🧾 4. Prepare TensorFlow Datasets

import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
)).shuffle(1000).batch(16)
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
)).batch(16)

🧠 5. Load and Fine-Tune BERT

from transformers import TFBertForSequenceClassification, AdamW
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model.compile(optimizer=AdamW(learning_rate=2e-5),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_dataset, validation_data=val_dataset, epochs=3)

✅ 6. Evaluate Model Performance

loss, accuracy = model.evaluate(val_dataset)
print(f"\n✅ Final Validation Accuracy: {accuracy:.4f}")

Output:

✅ Final Validation Accuracy: 0.9314

📊 Comparative Analysis: LSTM vs BERT

Now that we’ve implemented and evaluated both models, let’s compare them side by side:

Image by Author

Conclusion: BERT edges out LSTM in accuracy due to its deep understanding of context. However, if you’re short on compute or want a simpler model, LSTM still provides excellent results.

⚖️ When to Use LSTM vs BERT

Choosing between LSTM and BERT depends on your goals and resources:

Use LSTM if:

  • You’re constrained on computational resources
  • You want to build models from scratch for educational purposes
  • Dataset is relatively small and domain-specific

Use BERT if:

  • You need state-of-the-art accuracy
  • You’re working with large or noisy text data
  • You want to leverage transfer learning for better generalization

💡 Final Thoughts

Both LSTM and BERT are powerful in their own right. While BERT dominates with context-awareness and pre-training, LSTMs still remain relevant for faster deployment and simpler pipelines. In the battle against fake news, picking the right model is just one part — the key is using technology ethically and effectively to promote truth.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.