21 Dec 2024 1 min read tech

ARC-AGI is a genuine AGI test but o3 cheated :( — LessWrong

o3 is fascinating but let's take the ARC results with a helping of NaCl.

ARC-AGI is a genuine AGI test but o3 cheated :( — LessWrong:

By training o3 on the public training set, the ARC-AGI no longer becomes an AGI test. It becomes yet another test of memorizing rules from its training data. This is still impressive, but something else.

I do not know what exactly OpenAI did. Did they let o3 spend a long time generating chains of thought, and reward ones which led to the correct answer? If that failed, did they have a human give examples of correct reasoning steps, and train it on that first? I don't know.

They admitted they "cheated," without saying how they "cheated."

And there’s this in the end.

EDIT: People at OpenAI seem to be denying “fine-tuning” o3 for the ARC, see this comment by Zach Stein-Perlman. It’s unclear whether they’re denying reinforcement learning (rewarded by correct ARC training set answers), or whether they’re just denying they used a separate derivative of o3 (that’s fine-tuned for the test) to take the test.

It seems like o3 is definitely a step up. However, ARC being treated as a data set to solve ARC is definitely an argument that must be treated seriously.

You might also like...

Fanatic defensiveness of a position is likely a recipe for disaster

Nintendo Switch 2 - teardown by ifixit

AI's slow (er than expected) take off...

Prompt injection continues to be a major vector of attack for LLMs

Code is a tool to make art...