Recent findings by Apple researchers cast doubt on the mathematical power of large-scale language models (LLMs) and call into question the notion that artificial intelligence (AI) is on the verge of human-like reasoning. Masu.
Apple tested 20 state-of-the-art LLMs and found that performance on elementary school math problems dropped sharply when the problem was changed slightly or extraneous information was added. Accuracy decreased by up to 65.7%, revealing the alarming vulnerability of AI systems when faced with tasks that require robust logical reasoning.
This weakness could have far-reaching implications for commercial transactions that rely on AI for complex decision-making. Financial institutions, in particular, may need to reevaluate the use of AI in tasks involving complex calculations and risk assessments.
At the heart of this discussion is the concept of artificial general intelligence (AGI). This is the holy grail of AI, which has the potential to match or exceed human intelligence across a variety of tasks. While some technology leaders predict that AGI is on the horizon, these findings suggest we may be further from that goal than previously thought.
Professor Selmer Bringsjord of Rensselaer Polytechnic Institute says, “Real-world applications that require the kinds of inferences that can (or cannot) be ultimately verified depend on how consistently and correctly an LLM gets them done. It’s literally impossible,” he told PYMNTS.
Bringsjord draws a clear line between AI and traditional computing. “What a calculator can do on your smartphone is something an LLM can’t do, because you want to make sure that the calculations that someone called from your iPhone are actually correct.” Well, eventually. And it’s always possible for Apple to verify or falsify the results. ”
Limitations and understanding
Not all experts view the limitations identified in Apple’s paper as equally problematic. “The limitations outlined in this study are likely to have minimal impact on real-world applications of LLM. This is because most real-world applications of LLM do not require advanced mathematical reasoning. ,” Aravind Chandramouli, head of AI at data science company Tredence, told PYMNTS.
Potential solutions exist, including fine-tuning pre-trained models for specific domains and prompt engineering. Specialized models designed for mathematical tasks, such as WizardMath and MathGPT, have the potential to enhance AI’s capabilities in areas that require rigorous logical thinking.
The debate extends beyond mathematics to the fundamental question: “Do these AIs really understand anything?” This issue is at the heart of the debate about AGI and machine recognition.
“LLMs have no idea what they’re doing. They’re just looking for sublingual patterns in stored data that are statistically similar to patterns in that data.” said Bringsjord.
“Their consistent answers can create the illusion of understanding, but their ability to map statistical correlations within the data provides a true understanding of the task they are performing,” Chandramouli said. This insight highlights the challenge of distinguishing between advanced pattern recognition and true understanding in AI systems.
Eric Bravick, CEO of The Lifted Initiative, acknowledges the current limitations but believes there are potential solutions. “Large-scale language models (LLMs) don’t have the ability to perform mathematical calculations. They don’t understand mathematics,” he said. However, he suggests that combining LLM with specialized AI subsystems could yield more accurate results.
“When combined with specialized AI subsystems trained in mathematics, we can obtain accurate answers rather than generating answers based on statistical models trained for language generation,” Blavic said. Masu. Emerging technologies such as search augmentation generation (RAG) systems and multimodal AI have the potential to address current limitations in AI inference.
Evolving field
The field of AI continues to rapidly evolve, and LLMs have demonstrated remarkable language processing and generation capabilities. However, their struggles with logical reasoning and mathematical understanding made it clear that much work was still needed to achieve AGI.
Careful evaluation and testing of AI systems remains important, especially for high-stakes applications that require reliable inference. Researchers and developers are looking to approaches such as fine-tuning, specialized models, and multimodal AI systems as promising avenues to bridge the gap between current AI capabilities and envisioned robust general intelligence. You might find it.