Computational Linguistics: Detecting AI-Generated Text
Last Updated on December 30, 2023 by Editorial Team
Author(s): Matteo Consoli
Originally published on Towards AI.
AI content indicators: ASL, readability, simplicity, and burstiness.
Introduction
Every time I read something on Medium or LinkedIn, I canβt stop thinking whether itβs written by a human or itβs a text generated by AI.
Iβm developing a sixth sense to detect AI-generated content by smelling it. I might define myself as a truffle dog of the third millennium.
Universally, we can all agree that my instinct is not a scientific and reproducible method.
There are various tools online that quite effectively can identify AI texts. These tools, certainly, are not trained on my instinct. So, how exactly do they detect AI content?
This article introduces a few computational linguistic analyses that can help categorize text as human-written or AI-generated: average sentence length, readability, simplicity, and burstiness.
AI-Content Indicators (ACI)
In a world crowded with KPIs (Key Performance Indicators) and KRIs (Key Risk Indicators), I couldnβt miss the opportunity to define my ACIs: AI Content Indicators.
The analyses mentioned above have been part of computational linguistics for a long time already. Today, they can play a role also in determining the probability of a text being AI-generated.
Average Sentence Length
- Description: The Average Sentence Length (ASL) provides the average number of words composing sentences in a given text.
- AI-Content-Indicator: Higher ASL may be indicative of certain text generation methods. The ASL for a human being is between 15β20 words.
ChatGPT answered with a long 57 words sentence, contributing to a high score for this parameter.
- Formula & Python Code
Readability
- Description: Readability measures how easily text can be understood. It considers factors like sentence length, word complexity, and syllable count.
The formula Iβm sharing is the Flesch reading-ease score (FRES). Donβt confuse it with the FleschβKincaid grade level that has, ultimately, a similar objective. There are more than 200 different formulas to calculate text readability. - AI-Content-Indicator: Higher FRES values indicate straightforward text and lower values suggest complexity. AI tends to write unfairly complex sentences, especially when it comes to scientific/tech domains.
FRES score for the sentence generated by ChatGPT is around 26.
The description of readability, written at the beginning of this paragraph, has a score of 70.
- Formula & Python Code
Simplicity Score
- Definition: Simplicity can be defined as an umbrella of features analyzing how βsimpleβ a text is. Among all the metrics, itβs worth mentioning: Lexical density and Syntactic Complexity.
The first one is based on content-carrying words (nouns, verbs, adjectives, and adverbs) against the total number of words in a text. This means that the output of this function can be a number between 0 and 1. A higher lexical density (close to 1) indicates greater complexity.
The second one is a similar concept scaled to the text structure and it considers the number of clauses over the total number of sentences in a given text. - AI-Content-Indicator: Lower simplicity scores indicate simpler language. AI output has often a low simplicity score given by high lexical density and high syntactic complexity.
The average Lexical Density for a human being is around 0.5/0.6, while in the example above, it is close to 1.0. The syntactic complexity of the paragraph generated by ChatGPT is, in this example, within the human average of 3β5.
- Formula & Python Code
Burstiness
- Description: Burstiness analysis evaluates the occurrence of specific words or sentences against their expected frequency.
- AI-Content-Indicator: language patterns or repeated syntactic structures are often used by AI.
In the example above, I asked ChatGPT to write a paragraph about data science with high burstiness. The terms data science, algorithms, neural networks, and innovation occur frequently, increasing the burstiness of βML domain-specific languageβ.
Calculating burstiness is not as simple as the other ACIs mentioned.
The main takeaway is that human text tends to be more discontinuous and this corresponds to higher cross-sentence variation compared to a text written by an AI.
Conclusion
AI is, unanimously, an always-evolving domain. AI models are enhanced day by day, generating text that mimics more and more statistical patterns of human writing. One day not even my truffle-dog sixth sense will detect AI content anymore (if you skipped the intro, give it a look to catch this!).
- Am I against AI content?
Absolutely not. Iβm not a native English speaker and I use regularly AI to make sure that my content donβt contain grammar errors and that the text is clear and simple. I used it even for the article you are reading right now. Nevertheless, as a reader, I prefer to dedicate my time to enjoying the writersβ flows, their unique styles, and their personal experiences (and not the pre-packaged scenarios that an AI can offer). - Is a combination of the ACIs described above enough to detect AI content?
Definitely not. Instead, what is described above could help you use AI to generate more human-friendly output (e.g. βEhi ChatGPT, talk to me about XYZ, but keep the ASL low and FRES highβ).
A good AI detector might still detect it though, hence, donβt try to play this game for your college essays. U+1F606 - Are the AI detectors in the market reliable?
Some are pretty good although they might provide false positives or false negatives. Overall, they are good tools while scrutinizing Medium and LinkedIn posts, especially when these contents are fully AI-generated.
Fun fact: I read about the Constitution of the United States being categorized as AI generated by an AI detector. I tremendously doubt it was written using AI back in the day. Letβs not forget to use AI detectors wisely.
The views and opinions expressed in this article are my own and not those of any of my current, previous, or future employers.
Unless otherwise noted, all images are by the author.
Additional Resources & Bibliography
- Readability, Wikipedia
- Flesh Kinkaid Readability Tests, Wikipedia
- Accounting for Word Burstiness in Topic Models by G. Doyle and C. Elkan, published at ICML in 2009 pdf
- AI Thinks the Constitution was made by AI, New York Post (25 July 2023)
- AI Content Detectors: GPT Zero, CopyLeaks
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI