The reasoning abilities of the current generation of Large Language Models display unsatisfactory brittleness. Oxford University researcher Owain Evans and colleagues uncovered a remarkable defect in the logical reasoning capabilities of LLMs. See the 2023 article: The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
If you ask GPT-4 "Who is Tom Cruise's mother?" it correctly answers "Mary Lee Pfeiffer" 79% of the time.
If you ask GPT-4 "Who is Mary Lee Pfeiffer's son?" it correctly answers "Tom Cruise" only 33% of the time.
[Begin excerpt from abstract]
We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?". Moreover, the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if "A is B'' occurs, "B is A" is more likely to occur).
[End excerpt]
I think strategies can be developed to compensate for this this defect, nevertheless it is significant. Interestingly, humans also have problems with reversals. The classic example is the fact that humans can typically recite the alphabet in standard order but not in reverse order.