Science & Technology | 3.28.2024

Studying ChatGPT Like a Psychologist

Cognitive science helps penetrate the AI “black box”

by Nina Pasquini

**By adopting various approaches from cognitive science, researchers can also analyze the behavior of large language models like ChatGPT.** | montage illustration by niko yaitanes/*harvard magazine;* images by unsplash

Ask GPT-4, the most advanced model of ChatGPT, to decode a string of text written in ROT13—a cipher that involves shifting each letter 13 places forward in the alphabet—and it will successfully complete the task. But ask it to decode a string written in ROT12, and it fails.

Why does the model’s ability to complete the task depend on the number of places a letter is shifted? A first attempt to answer this question might involve looking at the internal mechanisms that drive the model’s behavior. But with proprietary systems like ChatGPT, that information is not publicly available—and even if it were, it’s difficult, if not impossible, to fully understand.

But there’s another way to approach the question, says Tom Griffiths, a professor of psychology and computer science at Princeton University. Researchers have already figured out how to understand the behavior of a different system with opaque training data and complex internal mechanisms that are difficult to access: the human brain. By adopting various approaches from cognitive science, researchers can also analyze the behavior of large language models like ChatGPT, Griffiths argued at a March 15 seminar at the Kempner Institute for the Study of Natural and Artificial Intelligence.

To explore what is happening with the shift ciphers, researchers can use rational analysis, a cognitive science technique that analyzes the behavior of an intelligent system—whether the human mind or ChatGPT—by comparing its solution to the problem’s ideal solution. To determine the optimal solution, cognitive scientists use Bayesian statistics, which calculate the probability of an event based on prior knowledge of factors related to that event—for example, an individual’s estimate of another person’s lifespan will be influenced by their knowledge of human life expectancies. By making Bayesian models of cognition, researchers can make inferences about the information people use to make judgments.

It’s possible to perform the same kind of analysis on AI models. In a 2023 paper, Griffiths and colleagues examined GPT-3.5 and GPT-4’s answers to “deterministic problems,” or problems where there is only one right answer—including the decoding of shift ciphers. For deterministic problems, a perfectly rational agent will not allow prior distributions to shape its output: there is only one way to shift a letter 12 places forward in the alphabet. But ChatGPT does not solve the problem in this way, the paper showed; instead, it allows prior distributions to influence its answers. “And the prior distribution for this model,” Griffiths says, “is the distribution of Internet text.”

That distribution influenced how ChatGPT approached shift ciphers. ROT13 is commonly used on the Internet to conceal puzzle solutions and spoilers; ROT12, on the other hand, appears much less frequently. So ChatGPT isn’t going through the process of actually shifting the letters, thereby displaying “some kind of general intelligence,” Griffiths says. “What it’s illustrating is a very specific intelligence, [catered] to the kinds of problems it’s encountered before.”

Another cognitive science technique, examining “axiom violations,” can also help researchers understand AI—and shows how AI can in turn provide insights into human psychology. This method aims to identify the mechanisms that drive human behavior by analyzing situations where people behave in a manner that contradicts rational decision-making principles.

In one kind of axiom violation experiment, researchers present pairs of events and ask participants to rate the likelihood that different combinations of those events will occur: A and B, A or B, and so on. Researchers then check if these ratings align with coherent probability distributions. In a 2020 paper, Jian-Qiao Zhu and colleagues found that human estimates systematically deviated from the correct answer.

To explain this phenomenon, psychologists had previously speculated that individuals were unknowingly constructing Bayesian estimates based on their prior experiences. But analyzing AI has complicated this interpretation: the researchers observed the exact same systematic deviation in large language models, which lack the prior knowledge, Griffiths says, that would lead humans to make Bayesian estimates.

The similarity in the data “should make us think that there’s some kind of universality across intelligent systems that’s being reflected in this behavior,” Griffiths says, “that might make us go back and try to come up with a better explanation for what’s going on in the human data.”

Analyzing “similarity judgments,” another cognitive science approach, can also help also help researchers understand AI models—and ensure that their values align with those of humans. To understand how individuals perceive and categorize the world, cognitive scientists can ask them to rate the degree of similarity between objects, concepts, or situations. By quantifying and mapping these similarity ratings, researchers analyze human representations of sensory perception, emotion, social groupings, and so on.

In a 2023 paper, Griffiths and colleagues asked GPT-4 and other models to rate the degree of similarity between sensory experiences—and then compared those ratings to the human data. Since the models make these judgments based on the text they were trained on, this information sheds light on what kind of sensory experience is best captured by language: for example, GPT-4’s judgments for color are more similar to human data than its judgments for taste. “So, it might be okay to ask your large language model for advice on your outfit,” Griffiths says, “but you might not want to ask it for advice on what ingredients to combine together in a recipe.”

Similarity judgments can also be used in higher-stakes situations. In a 2023 paper, Griffiths and colleagues asked participants to rate 50 behaviors on a scale of 0 to 100, according to how ethical they were. They then used these data to train different machine learning models and tested the models on their ability to make ethical decisions. The more aligned a model’s ethical representations were with the human data, the researchers found, the better it was at making ethical decisions. These kind of similarity ratings, then, could be helpful in making models designed to perform tasks which necessitate ethical reasoning.

Though researchers can’t look inside proprietary machine learning models to analyze their internal structure, Griffiths argues, they have other options: methods developed during the study of human minds offer a new way to make sense of intelligent machines. “Cognitive scientists have… a rich toolbox of methods for understanding intelligent systems,” Griffiths says. “And those tools are really useful when we’re reduced to analyzing the behavior of those systems.”