Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

How “Towards AI” detects AI-Generated Articles
Artificial Intelligence   Latest   Machine Learning

How “Towards AI” detects AI-Generated Articles

Last Updated on December 21, 2023 by Editorial Team

Author(s): Ahmad Mustapha

Originally published on Towards AI.

Photo by Brett Jordan on Unsplash

The last time I submitted an article to the “Towards AI” publication I was surprised by their reply. It said:

Your blog has been flagged as containing AI-generated content. Double and triple-check any statements and facts for mistakes and hallucinations before resubmitting.

I went go-and-forth with them on this. They make it clear that they do accept AI helping writers in their content, which is expected from a “Towards AI” publication, only if it is clear and doesn’t contain hallucinations.

So yes, my article was created using ChatGPT. Not that it was the brainchild of ChatGPT. I went with GPT in a go-and-forth fashion from the introduction to the supporting details to the conclusion. But it was flagged as AI-generated. The question is how they figured it out.

Not Plagiarism

One would think that there is some sort of plagiarism system under the hood. But as a matter of fact, you won’t be able to see my co-AI article anywhere on the internet. It was original. So how did they manage to detect that it was AI-generated? And what are hallucinations anyway?

AI Detectors

If you are familiar with how neural networks operate, you will know that you can always depend on neural networks to do what even humans can fail. Neural networks are pattern-hungry mathematical models that, when given a task of differentiating between two classes (authentic and synthetic text in our case) of data, learn the slightest probability variation between both.

Large language models (LLMs) like ChatGPT have been trained on a large amount of data on different writing styles. Their generated text distribution might be somehow different from that of humans. They consume more styles, and they are more capable. Their language model is superior to human language models (even without being able to understand or pinpoint what is being generated — see it as a pure probabilistic mathematical process).

To understand what I mean by text distribution, consider the following snippet generated by ChatGPT and the following toy reasoning:

This ambition is widespread, and a wealth of general wisdom, as well as the lessons from businesses that faltered in their attempts to build AI from scratch, underscores the pitfalls of such an approach.

This ambition is widespread. What is the probability of the word “widespread” coming after the phrase “This ambition is”? Perhaps for some of us, it feels exotic. It has a low probability. After every “This ambition is” in different human-generated texts, only a few use the vocab “widespread”. It is grammatically correct, but we tend to say more like “This ambition is common, is not uncommon, is not unique”.

If we trained another model to differentiate between human text and AI text it will be able to figure it out. Why? Because they come from a different distribution. Different minds. One formed of a collection of individual humans each with a style. Another form of a superior or meta-style has been dealt a bad hand by being trained in all kinds of styles. The second is less consistent and hallucinates by using a spaghetti of styles in one paragraph.

Eventually

Some people argue that eventually, it won’t be possible to differentiate between both [1]. However, for now, this is not the case. A research group from the owners of ChatGPT themselves trained a model [2] to detect chtGPT generated text. Now we have many commercials that provide such services as Copyleaks, Scribber, GPTZero, and Undetectable.

Take away

First, try not to submit AI-generated articles to “Towards AI”. Second, “keep calm” and “don’t panic yet” as for a while, the usage of AI detectors will prevent AI from taking over writers' jobs.

[1] Can AI-Generated Text be Reliably Detected? Sadasivan et al.

[2] OpenAI. Gpt-2: 1.5b release. November 2019.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓