AI Struggles With Our Same Readability Problems: A Perfect Misunderstanding

Reading Time: 6 minutes

Raise your hand if you’ve heard—or thought—this at least once in the past month: “Why bother reading this all? Just ask ChatGPT to explain it.” ✋

This might sound like a valid point—or an unforgivable blasphemy if you’re a bookworm like me. But, let’s face it, it’s probably both. The idea of outsourcing understanding to generative AI (genAI) feels almost like cheating, yet very convenient. We must learn how to use this tool correctly.

If we’re talking about understanding written text—the most common way to interact with GenAI—we are talking about readability. Whether you’re chatting with ChatGPT, emailing your boss, messaging customer service, or even texting your father, the ease of understanding responses depends on readability: the simplicity, clarity, and accessibility of the written text.

But is GenAI truly good at explaining things, or does it just seem that way? What does “understanding” really mean when GenAI dominates the way we interact with information?

To Read or Not to Read

GenAI feels like something straight out of science fiction, because it is: it’s been around in stories since the 1940s. One of its literary ancestors is Multivac, a supercomputer created by Isaac Asimov that serves as humanity’s ultimate oracle. Multivac is fed all possible data about the world and solves everything from political crises to scientific mysteries. Sound familiar?

 While it’s evident that Multivac is far more intelligent than humans, its stories focus more on humanity misusing or over-relying on its data than on not understanding it. That is, Asimov didn’t conceive a world where humans literally couldn’t understand what the super AI was saying. Multivac doesn’t seem to have readability problems—except perhaps in my favorite story, “The Last Question,” where humans never get an understandable answer until… well, no spoilers. But trust me, it’s worth a read.

So here’s my point: Is our modern generative AI like Multivac, or does it share our human readability problems? It has undeniably changed the game for content creators and knowledge workers—you can offload routine tasks or trigger fresh ideas with just a few prompts. But does it actually help us understand information any better?

Calling Our Thinking Systems

In Thinking, Fast and Slow, Daniel Kahneman describes our brain as operating in two modes:

  • System 1: Automatic, fast, and intuitive. It uses mental shortcuts and past experiences. It’s our default mode.
  • System 2: Slow, analytical, and effortful. It uses logical reasoning and critical thinking. Brain doesn’t like being here unless it’s really motivated.

 There’s still little research on how genAI affects our thinking systems. I wish we could ask Kahneman himself, but he passed away last March at age 90, leaving behind prolific research on how people think and make decisions—a body of work that earned him the Nobel Prize in 2002.

So, without being able to consult the original source, I did the next closest thing: I asked ChatGPT. When asked “What might happen to human brains when reading information written by genAI?” it replied that “we may default to System 1 unless prompted to think critically“. I couldn’t help but laugh at the irony—it suggested prompting us to understand!

But the point is valid. System 1 loves fluency and can be easily fooled by confident, human-like language—the kind ChatGPT is designed to produce. If we don’t consciously engage System 2, we may just accept anything that sounds polished or familiar. This leaves us vulnerable to overlooking logical inconsistencies or inaccuracies.

So I’m going to call out your system 2: Can you tell if this text was written by ChatGPT? How would you notice? Does your grasp of this text depend on its structure, “beauty,” or flow?

How are our brains adapting to this new information flow? Like they have been (mis)adapting to social media from the 2000s, how will this shape our minds in 10 or 15 years?  I think it’s worth taking the time to think about it—slowly.

All systems 2 down

System 2 isn’t our brain’s default for a reason—it’s resource-intensive. We’re unlikely to analyze the validity of AI-generated information if we’re tired, rushed, distracted, or unmotivated. Sleep deprivation doesn’t help, either!

This reveals a hidden bias in GenAI usage: those used to handle complex information are more likely to activate System 2 and catch potential inaccuracies. In contrast, people less used to working with knowledge might treat AI as an oracle, missing subtle clues that something’s not accurate. This problem worsens when GenAI outputs aren’t in the user’s native language—fluent phrasing can easily mask flaws.

If Symptoms Persist, Consult Your [Insert ChatGPT App]

If you search academic papers on “ChatGPT,” “understanding,” and “readability,” you’ll notice a trend: healthcare and medical information. Researchers are testing how patients and healthcare professionals interpret ChatGPT’s outputs. It’s no secret that we all Google our symptoms, and now we’ll just ask ChatGPT instead. But GenAI is also being used to create official patient information. And things can go wrong if we are not careful.

For example, they studied ChatGPT’s explanations of common Ear, Nose, and Throat (ENT) operations [1]. The initial readability was poor, so they asked to simplify the text. This improved readability by 43% but dropped quality by 11%. In other words, the words were “easier” but with crucial omissions, reducing overall accuracy. A similar pattern showed up in another study on interventional radiology [2]: more readable text, but less reliable content. Patients might feel they “understand,” yet miss important risk factors. Another similar study found ChatGPT’s information on Anterior Cruciate Ligament (ACL) injuries to be high-quality but at a reading level well above average U.S. adults [3]. An interesting note: patient information from the UK was more readable and of higher quality [1].

They also compared ChatGPT and Google Bard for simplifying patient-information sections from scientific journals (maybe this is not widely known, but some medical journals do require writing an abstract in plain English so everybody can understand what it is about).

ChatGPT did not reach the recommended 6th-grade reading level, while Bard omitted texts up to 83% [4]. Still, the authors encouraged using ChatGPT to optimize written patient information materials, as a training tool to help developers become more skilled in using more appropriate reading levels.

The takeaway? GenAI is a helpful tool for optimizing readability, but it still needs expert oversight. The key word here is ‘training tool’:  generative AI can make language appear more accessible, but sometimes at the cost of clarity or accuracy. Simpler doesn’t mean “dumbing down”: The brain’s job—organizing, interpreting, and making meaning—doesn’t disappear just because the text looks easy.

Understanding Is Our Thing, GenAI Is Our Tool

Understanding is not about simpler words; it’s about how our brains process and integrate information [5].

GenAI is great for speeding up non-essential tasks, boosting creativity, and learning faster. But understanding—true, human understanding—remains our domain. AI readability is tricky because it doesn’t comprehend its outputs; it only makes them seem comprehensible. Without engaging System 2, we risk mistaking fluent language for genuine clarity.

In the end, understanding is a human art. We read, reflect, and integrate information into our broader knowledge. AI can’t do that for us—at least, not yet.

As Tyrion Lannister would say, “That’s what I do. I drink, and I know things.” That’s precisely what we humans do best.

from Tenor.com

References:

  1. Abou-Abdallah, Michel, et al. “The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations?.” European Archives of Oto-Rhino-Laryngology (2024): 1-7.
  2. Zaki, Hossam A., et al. “Using ChatGPT to improve readability of interventional radiology procedure descriptions.” CardioVascular and Interventional Radiology 47.8 (2024): 1134-1141.
  3. Fahy, Stephen, et al. “Assessment of Quality and Readability of Information Provided by ChatGPT in Relation to Anterior Cruciate Ligament Injury.” Journal of Personalized Medicine 14.1 (2024): 104.
  4. Moons, Philip, and Liesbet Van Bulck. “Using ChatGPT and Google Bard to improve the readability of written patient information: a proof of concept.” European Journal of Cardiovascular Nursing 23.2 (2024): 122-126.
  5. Just, Marcel A., and Patricia A. Carpenter. “A capacity theory of comprehension: individual differences in working memory.” Psychological review 99.1 (1992): 122.


Share this post!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.