- lofi papers
- Posts
- 🔮 Why AI can’t tell how many sisters you have
🔮 Why AI can’t tell how many sisters you have
The Problem
Why do some of the smartest language models fail at answering simple questions a child could solve?
The paper digs into this by presenting a straightforward problem often given to kids: "Alice has N brothers and M sisters. How many sisters does Alice's brother have?"
The right answer, by basic reasoning, is M+1. Shockingly, many advanced language models can't solve this consistently. This suggests they're not as smart as we think regarding basic rationale.

The Solution
The researchers didn’t just ask the question once and call it a day. They took a systematic approach to test the language models' reasoning abilities by creating variations of the problem:
Basic Setup: They tested with different numbers of brothers (N) and sisters (M) to see how models handle these variations.
Four Variations: They crafted four specific variations:
Variation 1: N = 3, M = 6 (Correct Answer: 7)
Variation 2: N = 4, M = 2 (Correct Answer: 3)
Variation 3: N = 1, M = 4 (Correct Answer: 5)
Variation 4: N = 4, M = 1 (Correct Answer: 2)
Testing the Models: They put these variations to the test on several top language models, including:

Results Analysis: They didn't just examine whether the models got the right answer. They also analyzed how the models came up with their answers. Were they just guessing based on common numbers, or were they reasoning through the problem? Here’s what they found:
Inconsistency: The models often got the wrong answer, showing they struggled with basic reasoning.
Pattern Guessing: Some models seemed to guess based on patterns in the training data, not true understanding.
Conclusions
The paper’s big reveal: even our best language models aren’t as bright as we thought when it comes to basic common sense. They might excel at complex tasks, but they trip over simple reasoning problems.
This calls for a major rethink in how we assess these models. The research urges the development of new benchmarks that can better evaluate and improve the reasoning skills of language models. In other words, if we want smarter AI, we need to teach them to think like humans from the ground up.
Reply