AI detection is generating debates around AI's place in education--here's what happens when AI content is run through AI detection tools

We gave AI detectors a try–here’s what we found

AI detection is generating debates around AI's place in education--here's what happens when AI content is run through detection tools

Key points:

  • AI detection tools are skyrocketing in popularity–but how efficient are they?
  • A look at different AI detectors offers an eye-opening look at whether or not AI-generated pieces are identified as such
  • See related article: ChatGPT can generate, but can it create?

Nearly every school or university faculty is having at least a few conversations about how to address a world rich in easy-to-use artificial intelligence tools that can generate student assignments.

Multiple AI detection services claim efficacy in identifying whether text is generated by AI or human writers. Turnitin, ZeroGPT, Quill, and AI Textclassifier each represent this ability and are in use by higher-ed faculty and K-12 educators.

In an attempt to determine the effectiveness of Turnitin’s ability to identify artificial intelligence generated materials, students in a doctoral methods course were asked to submit one or two assignments that were fully generated by ChatGPT or another generative tool like Google’s Bard or Microsoft’s Bing AI. It appears that most students used ChatGPT. Of 28 fully AI-derived assignments, 24 of 28 were determined to be 100 percent AI generated. The other four ranged from zero to 65 percent AI-derived. The size of the papers ranged from 411 to 1368 words.

Turnitin returned evidence of potential plagiarism through its Similarity Scores in the range from zero percent to 49 percent. The average AI generated paper was noted to be 13.75 percent similar to other extant materials. (You can find Turnitin’s AI Writing detection tool FAQ here.)

As a control group, 17 other student papers from the same class of students were submitted to Turnitin as well. Of those papers, which ranged from 731 to 3183 words, the AI-derived scores ranged from zero to 28 percent. Ten of the papers showed no AI content, and four showed single-digit percentages of AI derived materials. One paper showed 14 percent AI material and the other 28 percent AI material. The highest AI-derived score came from a student whose first language was not English. According to the Turnitin site, currently the tool only detects for AI generation in English language submissions. One of the papers was returned without a similarity score. also provides an AI writing check tool. One of the limitations of this tool is that it is limited to checking 400 words or less. Quill itself offers an accuracy rate of between 80 to 90 percent and is based on the AI detection algorithm written by OpenAI, ChatGPT’s developer. Using 10 selections from ChatGPT-derived text of between 200 and 399 words, the Quill tool appears to provide accurate predictions for those limited selections of prose. All were identified as AI-derived. However, in checking a 388-word document generated by Google’s Bard, it predicted the text as written by a human. Taking the full 587-word version of the Bard-generated document, ZeroGPT, which offers a 98 percent accuracy rate, identified a 45.82 percent level of AI-generated text of the same fully AI-generated document. Taking one of the 100 percent generated documents that was returned by Turnitin as 0 percent AI-derived, ZeroGPT returned it as 97.79 percent AI-generated. ZeroGPT returned a summary of the document, highlighting sections it returned as AI-generated. OpenAI’s AI Text Classifier classified the same document as possibly AI generated. It returned the Bard-generated document as very likely AI-generated.

These tests represent the difficulty of identifying text created by generative AI tools, along with the need for faculty to understand their limitations and communicate with students around appropriate use of AI in the writing process.

The difficulty of identifying generative AI writing extends to the phenomenon often called “AI hallucination,” in which facts and resources are created to provide an answer to a prompt. Gary Lieberman presented, under the auspices of Grand Canyon’s Center for Innovation in Research on Teaching in early June 2023, that his review of Chat GPT generated references that 72 percent were either non-existent or incorrect. According to Lieberman, the remaining references pointed to government websites where the information could be found but was not attached to a correct URL citation.

University librarians have reported students looking for AI-generated reference lists containing sources by an actual author and with an actual title, but the author and title were not related. Several other presenters also cautioned that AI-generated materials tend to present inaccurate information, sometimes called hallucinations. However, Charley Johnson, a program director at Data Society, finds hallucination a problematic term.

Online AI detection services disclaim that there is no way to classify AI-generated text with 100 percent accuracy. Both Turnitin and Quill reference the need for faculty to set clear expectations, recognize that AI detection results could be incorrect or incomplete, and suggest communication with students when AI generation is suspected. As K-20 educators enter into this new phase of the information age, additional research and reflection on the creation of writing is necessary to uphold academic integrity while preparing students for new means of creation and communication.

Note: Drafts of this article were considered very unlikely AI-generated by AITextclassifier. ZeroGPT identified 0 percent of this article as AI generated. Quill’s tool identified a 360-word section of the article as written by human.

ChatGPT: The shakeup education was waiting for

Sign up for our newsletter

Newsletter: Innovations in K12 Education
By submitting your information, you agree to our Terms & Conditions and Privacy Policy.

eSchool Media Contributors