Image by Mohamed Hassan from Pixabay
Image by Mohamed Hassan from Pixabay

GPT-3 able to perform reasoning and problem solving tasks similar to humans

Embargoed until: Publicly released:
Peer-reviewed: This work was reviewed and scrutinised by relevant independent experts.

The large AI language model GPT-3 (of ChatGPT fame) can complete complex reasoning tasks and identify a reasonable solution for problems without direct training at a level that matches or surpasses human participants, according to international researchers. The team tested the performance of GPT-3 on solving new problems that it has not encountered before and compared the results with human performance. This kind of problem-solving relies on a key mental tool known as ‘analogical reasoning’ — the ability to see the similarities between an unfamiliar problem and a previously encountered one to identify a reasonable solution. The authors found that GPT-3 displayed a strong capacity for abstract pattern recognition, which matched or surpassed the human test participants in most tests.

Journal/conference: Nature Human Behaviour

Research: Paper

Organisation/s: University of California, USA

Funder: Preparation of this paper was supported by NSF grant IIS-1956441 and AFOSR MURI grant FA9550-22-1-0380 to H.L. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Media release

From: Springer Nature

Computer science: GPT-3 able to reason by analogy

The large language model GPT-3 is able to complete complex reasoning tasks and identify a reasonable solution for problems without direct training at a level that matches or surpasses human participants, a paper published in Nature Human Behaviour suggests.

One of the hallmarks of human intelligence is the ability to solve new problems that have not been encountered before. Cognitive scientists believe that this problem-solving relies on a key mental tool known as ‘analogical reasoning’ — the ability to see the similarities between an unfamiliar problem and a previously encountered one to identify a reasonable solution.

Taylor Webb and colleagues evaluated the performance of the large language model GPT-3 on analogy tasks and compared the results with human performance. The tasks evaluated were text-based matrix reasoning problems, letter-string analogies, verbal analogies and story analogies, all of which involve a pattern that needs to be identified and then applied to a new situation. For example, one problem presented was to complete the pattern “love : hate :: rich : ?” with the correct answer being “poor” (because just as love is opposite to hate, rich is opposite to poor). To ensure that GPT-3 could not repeat answers from training sets it had previously been exposed to, the authors designed many of the tasks. The authors found that GPT-3 displayed a strong capacity for abstract pattern induction, which matched or surpassed the human test participants in most tests.

The authors note that there are limitations to GPT-3 reasoning, as it has no long-term memory and could only complete reasoning tasks when provided with all of the relevant material. It is also unclear whether the model is solving these problems in the same ways that humans do, they conclude.

News for:

International

Media contact details for this story are only visible to registered journalists.