Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Computational Linguistics: Detecting AI-Generated Text
Data Science   Latest   Machine Learning

Computational Linguistics: Detecting AI-Generated Text

Last Updated on December 30, 2023 by Editorial Team

Author(s): Matteo Consoli

Originally published on Towards AI.

AI content indicators: ASL, readability, simplicity, and burstiness.

Introduction

Every time I read something on Medium or LinkedIn, I can’t stop thinking whether it’s written by a human or it’s a text generated by AI.
I’m developing a sixth sense to detect AI-generated content by smelling it. I might define myself as a truffle dog of the third millennium.
Universally, we can all agree that my instinct is not a scientific and reproducible method.

A truffle-dog according to AI. No complaints, it’s so cute! — playground.com

There are various tools online that quite effectively can identify AI texts. These tools, certainly, are not trained on my instinct. So, how exactly do they detect AI content?

This article introduces a few computational linguistic analyses that can help categorize text as human-written or AI-generated: average sentence length, readability, simplicity, and burstiness.

AI-Content Indicators (ACI)

In a world crowded with KPIs (Key Performance Indicators) and KRIs (Key Risk Indicators), I couldn’t miss the opportunity to define my ACIs: AI Content Indicators.
The analyses mentioned above have been part of computational linguistics for a long time already. Today, they can play a role also in determining the probability of a text being AI-generated.

Average Sentence Length

  • Description: The Average Sentence Length (ASL) provides the average number of words composing sentences in a given text.
  • AI-Content-Indicator: Higher ASL may be indicative of certain text generation methods. The ASL for a human being is between 15–20 words.
ChatGPT Screenshot — ASL Sample

ChatGPT answered with a long 57 words sentence, contributing to a high score for this parameter.

  • Formula & Python Code

Readability

  • Description: Readability measures how easily text can be understood. It considers factors like sentence length, word complexity, and syllable count.
    The formula I’m sharing is the Flesch reading-ease score (FRES). Don’t confuse it with the Flesch–Kincaid grade level that has, ultimately, a similar objective. There are more than 200 different formulas to calculate text readability.
  • AI-Content-Indicator: Higher FRES values indicate straightforward text and lower values suggest complexity. AI tends to write unfairly complex sentences, especially when it comes to scientific/tech domains.
ChatGPT Screenshot — FRES Sample

FRES score for the sentence generated by ChatGPT is around 26.
The description of readability, written at the beginning of this paragraph, has a score of 70.

  • Formula & Python Code

Simplicity Score

  • Definition: Simplicity can be defined as an umbrella of features analyzing how “simple” a text is. Among all the metrics, it’s worth mentioning: Lexical density and Syntactic Complexity.
    The first one is based on content-carrying words (nouns, verbs, adjectives, and adverbs) against the total number of words in a text. This means that the output of this function can be a number between 0 and 1. A higher lexical density (close to 1) indicates greater complexity.
    The second one is a similar concept scaled to the text structure and it considers the number of clauses over the total number of sentences in a given text.
  • AI-Content-Indicator: Lower simplicity scores indicate simpler language. AI output has often a low simplicity score given by high lexical density and high syntactic complexity.
ChatGPT Screenshot — Simplicity Sample

The average Lexical Density for a human being is around 0.5/0.6, while in the example above, it is close to 1.0. The syntactic complexity of the paragraph generated by ChatGPT is, in this example, within the human average of 3–5.

  • Formula & Python Code

Burstiness

  • Description: Burstiness analysis evaluates the occurrence of specific words or sentences against their expected frequency.
  • AI-Content-Indicator: language patterns or repeated syntactic structures are often used by AI.
ChatGPT Screenshot — Burstiness Sample

In the example above, I asked ChatGPT to write a paragraph about data science with high burstiness. The terms data science, algorithms, neural networks, and innovation occur frequently, increasing the burstiness of “ML domain-specific language”.

Calculating burstiness is not as simple as the other ACIs mentioned.
The main takeaway is that human text tends to be more discontinuous and this corresponds to higher cross-sentence variation compared to a text written by an AI.

Conclusion

AI is, unanimously, an always-evolving domain. AI models are enhanced day by day, generating text that mimics more and more statistical patterns of human writing. One day not even my truffle-dog sixth sense will detect AI content anymore (if you skipped the intro, give it a look to catch this!).

People wondering if what they are reading is written by human or generated by AI — playground.com
  • Am I against AI content?
    Absolutely not. I’m not a native English speaker and I use regularly AI to make sure that my content don’t contain grammar errors and that the text is clear and simple. I used it even for the article you are reading right now. Nevertheless, as a reader, I prefer to dedicate my time to enjoying the writers’ flows, their unique styles, and their personal experiences (and not the pre-packaged scenarios that an AI can offer).
  • Is a combination of the ACIs described above enough to detect AI content?
    Definitely not. Instead, what is described above could help you use AI to generate more human-friendly output (e.g. “Ehi ChatGPT, talk to me about XYZ, but keep the ASL low and FRES high”).
    A good AI detector might still detect it though, hence, don’t try to play this game for your college essays. U+1F606
  • Are the AI detectors in the market reliable?
    Some are pretty good although they might provide false positives or false negatives. Overall, they are good tools while scrutinizing Medium and LinkedIn posts, especially when these contents are fully AI-generated.

Fun fact: I read about the Constitution of the United States being categorized as AI generated by an AI detector. I tremendously doubt it was written using AI back in the day. Let’s not forget to use AI detectors wisely.

George Washington using ChatGPT playground.com

The views and opinions expressed in this article are my own and not those of any of my current, previous, or future employers.
Unless otherwise noted, all images are by the author.

Additional Resources & Bibliography

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓