Signal to Noise
Posts
NSA Joins the Conversation and New $1m AGI Prize to Test the I in AI

NSA Joins the Conversation and New $1m AGI Prize to Test the I in AI

AI Explained
June 14, 2024

New $1m AGI Prize to Test the I in AI

François Chollet throws down the gauntlet

What Happened? Lines were drawn in the sand. According to some, we have a benchmark which, when passed, would constitute the arrival of general intelligence. The ARC-AGI benchmark, if beaten, would, for its creators at least, mark the true moment that AGI was achieved. And the co-founder of Zapier, Mike Knoop, and François Chollet, famed AI researcher, are so confident of this, they put money where their mouths are with a prize pool of over $1m, including a grand prize for reaching 85% on this inventive, closely-guarded exam.

This is no MMLU

But humans find the test fairly straightforward

‘Reasoning’ and ‘general intelligence’ are hotly debated terms in AI, and many view multiple-choice, general knowledge benchmarks like the MMLU as insufficient to test true intelligence.
GPT-4, without modifications, scores very poorly <5%, while humans can get around 84% of the questions right.
But researchers like Jack Cole have already made significant inroads, reaching 34%, by (among many other things) finetuning models on (many synthetic versions of) the examples given in each question - a kind of ‘learning on the fly’ that could herald a new direction in advancing LLM’s reasoning prowess.

So What? Those who follow AI closely have been desperate for more challenging benchmarks that are harder to game. A benchmark which will truly test the reasoning capabilities of models and for which the goalposts wouldn’t move - one in which, in other words, a victory be an unambiguous milestone in the progress to an artificial intelligence like our own. Not only do we now have such a benchmark, the discussions around it could prove a valuable reminder to many of what remains between current LLMs and AGI.

Does It Change Everything? Rating = ⚁

Former head of NSA joins OpenAI board

What Happened? OpenAI and Sam Altman celebrated the arrival of the Paul M. Nakasone to the board, who joins such establishment figures such as Larry Summers and … well, Sam Altman himself. Nakasone was Trump’s appointed head of the US National Security Agency until as recently as February. Nakasone strongly supported the renewal of the Foreign Intelligence Surveillance Act.

excited for general paul nakasone to join the openai board for many reasons, including the critical importance of adding safety and security expertise as we head into our next phase.
— Sam Altman (@sama)
12:51 AM • Jun 14, 2024

The move was touted as being about cybersecurity, and as OpenAI have documented, there have been state-backed attempts to use GPT-4 for cyberwarfare and misinformation.
The move will, however, cause consternation among those who worry that OpenAI’s board is less non-profit oversight, more establishment holding-tight.
Meanwhile, former OpenAI researchers, disgruntled by among other things - OpenAI seemingly failing to uphold it’s promised 20% compute allocation to alignment research, argue for a right to warn.

So What? Perhaps it was inevitable that AI, as it got closer to AGI, would be increasingly overseen by government/establishment figures. But with overt calls from recent OpenAI employees to ‘race to AGI before China’, it feels like AI is more nationalised than ever.

Does It Change Everything? Rating = ⚀

For the full suite of exclusive videos, podcasts, and a Discord community of hundreds of truly top-flight professionals w/ networking (in-person + online) and GenAI best-practice-sharing across 30+ fields, I would love to invite you to our growing Patreon.

Subscribe to Premium to read the rest.

Become a paying subscriber of Premium to get access to this post and other subscriber-only content.

Upgrade

Already a paying subscriber? Sign In.

A subscription gets you:

• Exclusive posts, with hype-free analysis.
• Sample Insider videos, hosted ad-free on YT, of the quality you have come to expect.
• Access to an experimental SmartGPT 2.0 - see multiple answers to the same prompt w/ GPT-4 Turbo, for example, then have Claude 3 Opus review its work. Community-driven - so you can take the lead.
• Support for balanced, nuanced AI commentary, with no middleman fees, for you or me. Would love one day to be in a position to have a small team of equally engaged independent researchers/journalists.