How “Towards AI” detects AI-Generated Articles

Last Updated on December 21, 2023 by Editorial Team

Author(s): Ahmad Mustapha

Originally published on Towards AI.

The last time I submitted an article to the “Towards AI” publication I was surprised by their reply. It said:

Your blog has been flagged as containing AI-generated content. Double and triple-check any statements and facts for mistakes and hallucinations before resubmitting.

I went go-and-forth with them on this. They make it clear that they do accept AI helping writers in their content, which is expected from a “Towards AI” publication, only if it is clear and doesn’t contain hallucinations.

So yes, my article was created using ChatGPT. Not that it was the brainchild of ChatGPT. I went with GPT in a go-and-forth fashion from the introduction to the supporting details to the conclusion. But it was flagged as AI-generated. The question is how they figured it out.

Not Plagiarism

One would think that there is some sort of plagiarism system under the hood. But as a matter of fact, you won’t be able to see my co-AI article anywhere on the internet. It was original. So how did they manage to detect that it was AI-generated? And what are hallucinations anyway?

AI Detectors

If you are familiar with how neural networks operate, you will know that you can always depend on neural networks to do what even humans can fail. Neural networks are pattern-hungry mathematical models that, when given a task of differentiating between two classes (authentic and synthetic text in our case) of data, learn the slightest probability variation between both.

Large language models (LLMs) like ChatGPT have been trained on a large amount of data on different writing styles. Their generated text distribution might be somehow different from that of humans. They consume more styles, and they are more capable. Their language model is superior to human language models (even without being able to understand or pinpoint what is being generated — see it as a pure probabilistic mathematical process).

To understand what I mean by text distribution, consider the following snippet generated by ChatGPT and the following toy reasoning:

This ambition is widespread, and a wealth of general wisdom, as well as the lessons from businesses that faltered in their attempts to build AI from scratch, underscores the pitfalls of such an approach.

This ambition is widespread. What is the probability of the word “widespread” coming after the phrase “This ambition is”? Perhaps for some of us, it feels exotic. It has a low probability. After every “This ambition is” in different human-generated texts, only a few use the vocab “widespread”. It is grammatically correct, but we tend to say more like “This ambition is common, is not uncommon, is not unique”.

If we trained another model to differentiate between human text and AI text it will be able to figure it out. Why? Because they come from a different distribution. Different minds. One formed of a collection of individual humans each with a style. Another form of a superior or meta-style has been dealt a bad hand by being trained in all kinds of styles. The second is less consistent and hallucinates by using a spaghetti of styles in one paragraph.

Eventually

Some people argue that eventually, it won’t be possible to differentiate between both [1]. However, for now, this is not the case. A research group from the owners of ChatGPT themselves trained a model [2] to detect chtGPT generated text. Now we have many commercials that provide such services as Copyleaks, Scribber, GPTZero, and Undetectable.

Take away

First, try not to submit AI-generated articles to “Towards AI”. Second, “keep calm” and “don’t panic yet” as for a while, the usage of AI detectors will prevent AI from taking over writers' jobs.

[1] Can AI-Generated Text be Reliably Detected? Sadasivan et al.

[2] OpenAI. Gpt-2: 1.5b release. November 2019.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How “Towards AI” detects AI-Generated Articles

Author(s): Ahmad Mustapha

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

TAI #143: New Scaling Laws Incoming? Ilya’s SSI Raises at $30bn, Manus Takes AI Agents Mainstream

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How “Towards AI” detects AI-Generated Articles

Author(s): Ahmad Mustapha

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement