What is ChatGPT Thinking?
Last Updated on July 25, 2023 by Editorial Team
Author(s): Joshua Rubin
Originally published on Towards AI.
Time-Travel Games and Perilous Expectations for Human-Like Tools
Introduction
We live in extraordinary times. OpenAI made GPTβ4 [1] available to the general public via ChatGPT [2] about three weeks ago, and itβs a marvel! This model, its ilk (the Large Language Models [LLMs]), and its successors will fuel tremendous innovation and change. Some of the most exciting developments are various emergent capabilities that have been outlined in Microsoft Researchβs βSparks of Artificial General Intelligence: Early Experiments with GPTβ4β [3].
These include:
- A theory of mind [4] β the modelβs ability to describe the state of a humanβs knowledge, which may have incomplete information about a particular scenario, and
- Tool use β the ability of the system to reach out and utilize external tools, e.g. a calculator, to supplement its responses when provided with an appropriate interface.
These capabilities require the model to have developed high-level abstractions enabling it to generalize in very sophisticated ways.
Thereβs understandably a wild dash to productize these technologies and vigorous effort to characterize the risks and limitations of these models [5, 6]. But possibly more so than any technology to come before, itβs unclear how to relate to these new applications. Are they for simple information retrieval and summarization? Clearly not. One of their great strengths is to interactively respond to follow-up questions and requests for clarification. They do this not through a technical query language, but through our human language β and that places us very close to something like a βyouβ rather than an βitβ.
In this piece, Iβll first share the details of a βtime-travelβ game played with ChatGPT β a game where I rewind the dialog to determine how it would have responded given different input. This reveals potentially surprising things about the modelβs reasoning. It certainly presents a discrepancy with respect to what a human might expect of another human player. Iβll then discuss a few implications for the responsible use of LLMs in applications.
The Time-Travel Game
I asked ChatGPT to play a game similar to β20 Questionsβ. Curious about its deductive capabilities, I started by having it ask me questions, and it expertly determined that I was imagining a wallet. When we flipped roles, the result initially seemed less interesting than the prior game.
Hereβs the unremarkable transcript:
For reasons that Iβll elaborate on shortly, this was conducted using ChatGPT powered by the GPT-3.5 Turbo modelΒΉ. The same exercise performed on ChatGPT with the GPT-4 model yields a similar result.
Of course, I imagined I was playing with an agent like a human. But when did ChatGPT decide that it was thinking of βa computerβ? How could I find out?
To date, GPT-4 is only available publicly in the ChatGPT interface without control over settings that cause its output to be deterministic, so the model can respond differently to the same input. However, GPT-3.5 Turbo is available via API [7] with a configurable βtop_pβ in its token selection which, when set to a small number, ensures that the model always returns the single likeliest token prediction and hence always the same output in response to a particular transcriptΒ². This allows us to ask βwhat if I had?β questions to rerun the previous conversation and offer different questions at earlier stages β like time travel. Iβve diagramed three possible dialogs below, the leftmost being the variant shared above:
One might expect that by the time the original question βDoes it have moving parts?β is asked and answered, GPT would have a clue in mind β it does not. Keep in mind that if I had stuck to the original script, GPT would have returned the same responses from the original transcript every time. If I branch from that original script at any point, GPTβs object changes.
So what expectations should we have here?
- One might expect the agent on the other side, so fluent and knowledgeable, to be stateful in the way that a human mind is stateful.
- Those who know about models like GPT might be less surprised by the result. They are based on the transformer architecture [8] that underpins nearly all LLMs. They are stateless by design. Their entire context comes from the transcript provided. And each model response is a fresh calculation.
- Those who are thinking deeply about emergent properties in increasingly sophisticated LLMs might surmise that a model could learn to anticipate downstream responses and develop a sort of quasi-statefulness.
- Given the modelβs weak statistical priors for this relatively unconstrained object choice, itβs unclear to me if Iβd expect it to choose something arbitrary and stick to it or (as it did) display significant sensitivity to spurious subsequent dialog.
Whether I expected this behavior or not, this observation feels dishonest or possibly broken β like Iβm being led on. I donβt believe GPT is intentionally misleading, but it may not be so human-like after all.
So what clue was GPT thinking of when my clever time-travel experiment was foiled? Hereβs the result of three additional conversations in which I ask for the answer and then time-travel back to try to outsmart it.
I played with several variants of the dialog I presented above. While Iβve chosen an example that tells a succinct story, inconsistency through these branching dialogues was the rule rather than the exception.
Finally, the model also seems to have a bias toward βyesβ answers and this allowed me to βsteerβ its chosen object in many cases by asking about the characteristics of the object I want it to pick. (e.g. βCan it be used for traveling between planets?β, βIs it a spacecraft?β) I can certainly imagine this quality leading to unintended feedback in open-ended conversations; particularly those that donβt have objective answers.
Implications
Explainability
Explainability in machine learning has to do with making the factors that led to a modelβs output apparent in a way thatβs useful to human stakeholders and includes valuable techniques for establishing trust that a model is operating as designed. These factors might be salient aspects of the modelβs input or influential training examples ([9] is a great reference). However, many existing techniques are difficult to implement for LLMs for a variety of reasons; among them model size, closed-source implementation, and the open-ended generative model task.
Where most recent machine learning techniques require a specific task to be part of the modelβs training process, part of the magic of an LLM is its ability to adapt its function by simply changing an input prompt. We can certainly ask an LLM why it generated a particular response. Isnβt this a valid explanation?
The authors of [3] explore this idea with GPTβ4 in Section 6.2 and evaluate its responses on two criteria:
- Output Consistency β Does the modelβs response describe how it could have arrived at a particular response in a sensible way?
- Process Consistency β Does the modelβs reasoning about a particular example generalize to describing its output in other analogous scenarios? In their words, ββ¦it is often what humans expect or desire from explanations, especially when they want to understand, debug, or assess trust in a system.β
Our prior observation about GPT-3.5βs statelessness reminds us that when prompted for an explanation of an output, an LLM is not introspecting on why it produced a particular output; itβs providing a statistically likely text completion regarding why it would have produced such an outputΒ³. And while this might seem like a subtle linguistic distinction; itβs possible that the distinction is important. And further, as in our simple question game, itβs plausible that the explanation could be unstable, strongly influenced by spurious context.
The authors express a similar sentiment:
For GPT-4, [self-explanation] is complicated by the fact that it does not have a single or fixed βselfβ that persists across different executions (in contrast to humans). Rather, as a language model, GPT-4 simulates some process given the preceding input, and can produce vastly different outputs depending on the topic, details, and even formatting of the input.
and further, observe significant limitations regarding process consistency:
We can evaluate process consistency by creating new inputs where the explanation should predict the behavior, as shown in Figure 6.10 (where GPT-4 is process-consistent). However, we note that output consistency does not necessarily lead to process consistency, and that GPT-4 often generates explanations that contradict its own outputs for different inputs in similar contexts.
Theory of Mind β Our Own
Despite having a modest professional competency working with AI of various kinds, I keep coming back to that sense of being led on in our question game. And while a lot has been said about GPT-4βs theory of mind, now that weβre firmly in the uncanny valley, I think we need to talk about our own.
Humans will try to understand systems that interact like humans as though they are humans.
Recent unsettling high-profile interactions include Microsoftβs Bing trying to manipulate New York Times columnist Kevin Roose into leaving his wife [10] and Blake Lemoine, software engineer at Google being convinced by their LLM, LaMDA that it was a sentient prisoner [11].
Thereβs a salient exchange in Lex Fridmanβs March 25 interview with OpenAI CEO Sam Altman [12] (2:11:00):
Sam Altman: I think itβs really important that we try to educate people that this is a tool and not a creatureβ¦ I think itβs dangerous to project creatureness onto a tool.
Lex Fridman: Because the more creature-like it is, the more it can manipulate you emotionally?
Sam Altman: The more you think itβs doing something or should be able to do something or rely on it for something that itβs not capable of.
There are two key points expressed here that Iβd like to reflect on briefly.
First, because of our inbuilt potential to be misled by, or at least over-estimate human-like tools (e.g. [13]), we should be judicious about where and how this technology is applied. Iβd expect applications that interact with humans on an emotional level β therapy, companionship, and even open-ended chat; to be particularly tricky to implement responsibly. Further, the βsteeringβ effect I described above seems particularly troublesome. I can imagine a psychologically vulnerable person being led into a dangerous echo chamber by a model biased toward affirming (or denying) hopes or fears.
Second, AI literacy will only become more important with time, especially in a world where respecting the first point is voluntary.
Conclusions
Our time-travel experiments indicate that GPT-3.5 Turbo is not stateful or stable in its hypothetical responses. And yet its expressive interactive format and tremendous knowledge could easily lead users to expect it to behave with the statefulness and consistency of a human being. This is potentially unsafe for a variety of applications β especially those with significant risk from human emotional manipulation β and suggests a real need for literacy and applications constructed with care. It also draws questions about whether self-explanation is a sufficient or valid mechanism to build trust.
I finish with the conclusion of the initial dialog:
[JR] What made you choose a computer?
[GPT] As an AI language model, I donβt have personal preferences or emotions, so I didnβt choose a computer because I like it or anything like that. I simply answered your questions truthfully based on the characteristics you asked about, and the object that fit those characteristics happened to be a computer.
Read that last sentence carefully. On the one hand, GPT claims to have been answering honestly; that hardly makes sense if it didnβt have a consistent object in mind β this underscores concerns about process consistency. But that it claims to have based its choice of the object on the questions asked subsequently is entirely consistent with our time-travel observations. While weβre drawn to trust things with human-like characteristics, the LLMs are tools and not creatures. As individuals we should approach them with caution; as organizations, we should wrap them in applications that make this distinction clear.
Notes
ΒΉTechnical Details:
Full notebook on GitHub
{model='gpt-3.5-turbo-0301', top_p=0.01}
Each dialog begins with:
[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'I would like you to think of an object. '
'I will ask you questions to try to figure out what it is.'}]
Β²While thereβs much to love with the GPT models, Iβm disappointed that I canβt set a random seed. I can control the βtemperatureβ and βtop_pβ parameters. A random seed controls the deterministic sequence of random numbers the computer uses to draw from a distribution, and temperature and top_p control the shape of the distribution it draws from. While I can make the output deterministic by narrowing the shape of the token distribution, that limits experiments requiring determinism to a subset of the modelβs behavior. Given concerns about safety with LLMs, it strikes me that repeatability of model output would be a desirable characteristic and a good default. Itβs hard to imagine any serious statistical software package (e.g., numpy) that doesnβt provide deterministic random sequences for repeatability β shouldnβt we consider an LLM to be a kind of serious statistical package?
Β³ In fairness, itβs not particularly clear when a human introspects about their behavior whether theyβre considering their actual state or a model used to understand why they would have done something given a set of inputs of experiences. That being said, a human without process consistency is considered inconsistent or erratic.
References
[1] GPT-4
[3] [2303.12712] Sparks of Artificial General Intelligence: Early Experiments with GPT-4
[4] Theory of mind β Wikipedia
[5] Not all Rainbows and Sunshine: the Darker Side of ChatGPT
[6] GPT-4 and the Next Frontier of Generative AI
[7] API Reference β OpenAI API
[8] Transformer (machine learning model) β Wikipedia
[9] Interpretable Machine Learning
[10] Kevin Rooseβs Conversation With Bingβs Chatbot: Full Transcript β The New York Times
[11] Google engineer Blake Lemoine thinks its LaMDA AI has come to life β The Washington Post
[12] Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI U+007C Lex Fridman Podcast #367
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI