WeSearch

LLMs do fine on ARC-AGI-3 if they are allowed to search over game logs

· 0 reactions · 0 comments · 3 views
LLMs do fine on ARC-AGI-3 if they are allowed to search over game logs

I was reading the comments to this post and the overall opinion seemed to be that harness makes little/no difference for ARC-AGI-3. Turns out, it makes a huge difference: Hill-climbing ARC-AGI-3 TLDR: if you save game logs - taken actions, board states and scores - and let LLMs search over them with tools, LLMs are only moderately less efficient than humans in terms of the number of actions taken to beat ARC-AGI-3 games. Frontier LLMs struggle out of the box on this benchmark. In our preliminary

Original article
Singularity
Read full at Singularity →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Singularity