Close Menu
  • Home
  • Android
  • Android Operating
  • Apple
  • Apps
  • Gadgets
  • Galaxy
  • Ipad
  • IPhone
  • Smartphone
  • Tablet

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Muse Dash, Hyperforma, Tower of Fortune 4, etc.

March 28, 2025

Best Kitchen Gadgets of 2025

March 18, 2025

The best drawing tablets of 2025: Expert tested and recommended

February 13, 2025
Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
Wtf AndroidWtf Android
  • Home
  • Android
  • Android Operating
  • Apple
  • Apps
  • Gadgets
  • Galaxy
  • Ipad
  • IPhone
  • Smartphone
  • Tablet
Wtf AndroidWtf Android
Home » Reasoning failures highlighted in Apple’s LLM investigation
Apple

Reasoning failures highlighted in Apple’s LLM investigation

adminBy adminOctober 12, 2024No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Apple plans to introduce its own version of AI starting with iOS 18.1 – Image credit Apple

A new paper by Apple’s artificial intelligence scientists finds that engines based on large language models, such as Meta and OpenAI, still lack basic inference skills.

The group proposed a new benchmark, GSM-Symbolic, to enable others to measure the inference capabilities of various large-scale language models (LLMs). Their initial tests revealed that small changes in the wording of a query could result in significantly different answers, undermining the model’s reliability.

The research group investigated “vulnerabilities” in mathematical reasoning by adding contextual information to queries that humans can understand but that should not affect the underlying mathematics of the solution. As a result, we received a variety of answers, which should not be the case.

“Specifically, the performance of all models will decrease.” [even] “If only the numerical value of the question changes in the GSM-Symbolic benchmark,” the group wrote in its report. “Furthermore, the weaknesses of mathematical reasoning in these models [demonstrates] We found that performance degrades significantly as the number of clauses in a question increases. ”

The study found that adding a sentence that appeared to provide relevant information to a particular math question could reduce the accuracy of the final answer by up to 65 percent. “There is simply no way to build a reliable agent on top of this foundation. Just changing a word or two or adding a bit of extraneous information could give you a different answer. ” concludes the study.

lack of critical thinking

A particular example of this problem was a math problem that required a true understanding of the problem. The task the team developed, called GSM-NoOp, resembled math word problems that elementary school students might encounter.

A query started with the information needed to formulate a result. “Oliver picks 44 kiwis on Friday, 58 kiwis on Saturday, and twice as many kiwis on Sunday as Friday.”

The query then adds a clause that seems related but has no real bearing on the final answer, saying that of the kiwis harvested on Sunday, “5 of them were a little smaller than average.” “It was,” he points out. The required answer was simply “How many kiwis does Oliver have?”

The note about the size of the portion of kiwis harvested on Sunday has no relation to the total number of kiwis harvested. However, OpenAI’s model and Meta’s Llama3-8b subtract 5 small kiwis from the total result.

This flawed logic was supported by a previous study from 2019 that could reliably confuse an AI model by asking questions about the ages of the last two quarterbacks who played in the Super Bowl. By adding background and relevant information about the games they played, and a third person who was the quarterback in another bowl game, the model generated incorrect answers.

“We found no evidence of formal inference in language models,” the new study concludes. Although the behavior of LLMS is “better explained by sophisticated pattern matching,” the study “found it to be quite fragile in practice.” [simply] Changing the name may change the results. ”



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
admin
  • Website

Related Posts

What is apple cider?

October 31, 2024

Apple announces Vision Pro rollout in two more countries

October 31, 2024

GitHub releases public preview of Apple’s Copilot for Xcode

October 31, 2024
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Will Google’s new anti-theft feature be a game-changer for Android users?

October 13, 2024

Huawei’s Android replacement HarmonyOS Next launches next week, permanently discontinuing Google’s operating system on existing devices

October 11, 2024

Android 15 lets you turn your phone into a useful smart home dashboard – here’s how

October 11, 2024

Google ordered to open Android app store to competition

October 10, 2024
Top Reviews
Wtf Android
Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 wtfandroid. Designed by wtfandroid.

Type above and press Enter to search. Press Esc to cancel.