Media release
From:
Large Language Models (LLMs) like GPT-4, Llama 3, and others are increasingly popular for their human-like interaction capabilities. This study evaluates the temporal stability and inter-rater agreement of LLM responses to personality tests. Findings show that LLMs like Llama 3 and GPT-4o exhibit higher consistency, while models such as GPT-4 and Gemini display variable reliability over time. For traits showing fair agreement, LLMs depicted a prosocial profile with elevated agreeableness, conscientiousness, and lower Machiavellianism. Consistent, prosocial traits in LLMs are vital for their societal impact, ensuring stability and AI safety in human-AI interactions.