ARC-AGI is a genuine AGI test but o3 cheated :( — LessWrong
o3 is fascinating but let's take the ARC results with a helping of NaCl.
ARC-AGI is a genuine AGI test but o3 cheated :( — LessWrong:
By training o3 on the public training set, the ARC-AGI no longer becomes an AGI test. It becomes yet another test of memorizing rules from its training data. This is still impressive, but something else.
I do not know what exactly OpenAI did. Did they let o3 spend a long time generating chains of thought, and reward ones which led to the correct answer? If that failed, did they have a human give examples of correct reasoning steps, and train it on that first? I don't know.
They admitted they "cheated," without saying how they "cheated."
And there’s this in the end.
EDIT: People at OpenAI seem to be denying “fine-tuning” o3 for the ARC, see this comment by Zach Stein-Perlman. It’s unclear whether they’re denying reinforcement learning (rewarded by correct ARC training set answers), or whether they’re just denying they used a separate derivative of o3 (that’s fine-tuned for the test) to take the test.
It seems like o3 is definitely a step up. However, ARC being treated as a data set to solve ARC is definitely an argument that must be treated seriously.
Member discussion