• Signal to Noise
  • Posts
  • Anthropic CEO: "Beat China with Self-Improving AI" - no red flags there - plus Google's Titans

Anthropic CEO: "Beat China with Self-Improving AI" - no red flags there - plus Google's Titans

Is it me or is the world getting a little more cuckoo every few months?

Anthropic’s Amodei and the “AI War”

What Happened? We already had Scale AI’s young billionaire CEO, Alexander Wang, declare directly in the new US President’s ears that the US ‘must win the AI war’. Now we have Dario Amodei, CEO of Anthropic (makers of Claude.ai) - in an even greater position of influence - echo the rhetoric of arms races of decades past by saying that the West ‘must have better models than those in China if we want to prevail.’ That very same Amodei, before founding Anthropic, said that framing things as ‘a race to be won’ would likely lead to ‘safety catastrophes’. Not only this, he directly invokes self-improving AI as a cheat code to get there first. Whatever you think of this, he has not explained the dramatic shift in tone, nor the end game.

Scale AI’s uplifting message to world: ‘AI is made for peace and shared prosperity’. Wait, let me read that again, I need to get my magnifying glass.

  • If we don’t hamstring China, Amodei threatened, and they develop at the same pace as the US, ‘it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.’

  • Spending all of a sentence or so on one of the glaring omissions of his plan - the risks inherent in US global hegemony, he helpfully concedes that even if the US (Anthropic specifically, I’m sure he hopes) gets there first ‘It's unclear whether the unipolar world will last’. But wait, he has a solution…

  • [B]ecause AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.’ I am sure those super-intelligences will sing the Stars and Stripes and have no unintended goals of their own, as they recursively improve themselves. A flawless plan, you might say. Well, it’s one that his earlier self would have abhorred. And power corrupts, so without knowing why he has shifted, we are left to guess.

So What? For me, the dawn of true artificial intelligence is an epochal moment in human history, one that should be treated with solemn reverence. It will be an existential shock to many, and is already to me and Sam Altman (more on that another time). I would have hoped, and still try to hope, that we can at least attempt to usher in this new age in a spirit of shared human endeavour. A chance to lessen the salience of our petty territorial squabbles and unify - at least for a moment - as a species. I would advocate the same if this was the 50’s or 70’s - I don’t write this with naïveté about certain regimes. But anyway, this counts as news, so I thought I’d share it.

Does It Change Everything? Rating =

Google Titans Solves The Pressing Problem in LLMs?

Guest Post by TechTalks’ Ben Dickson

What Happened? Google researchers have developed Titans, a new transformer architecture that enables large language models (LLMs) to acquire new knowledge at inference time without the need to retrain the model. The key component of the architecture is the “neural long-term memory” module: think of it as a type of neural network layer that is designed to compress vast stores of knowledge without exploding the costs of memory and compute.

Titans continue to maintain its accuracy in retrieval tasks over very long sequences while other models quickly drop in performance.

  • Classic transformer models rely on attention layers to compute the relationship between different input tokens. While the attention mechanism is very useful, it is also super expensive, especially as the input sequence grows longer. Attention layers scale quadratically, which means every time you double the length of your input, the amount of memory required to store the attention values quadruples. In reality, many of the tokens don’t contain new or useful information.

  • Neural long-term memory layers complement attention layers. They scale linearly and only store bits of information that add value to what the model already knows. Neural memory determines the usefulness of new tokens through a “surprise” score, which is determined by how different they are from the information already stored in the model’s existing memory and knowledge.

  • The Titans architecture combines attention layers with neural memory modules, the former serving as the model’s working memory and the latter playing the role of long-term memory. Along with the model’s trained parameters, which acts as the model’s persistent memory, they create a layered memory stack that can dynamically adapt to the model’s environment.

  • Small-scale experiments (e.g. models with 170 million to 760 million parameters) show that Titans outperforms the vanilla transformer, linear models (e.g., Mamba), and hybrid models (e.g., Samba) in different language tasks. Titans can scale to millions of tokens in length while remaining compute-efficient, and they especially have an edge in complex retrieval tasks on long sequences.

So What? Updating the knowledge of models (as humans do for themselves every moment) remains one of the pressing challenges of current LLMs. Being able to dynamically add new information without costly retraining or extensive memory and compute requirements can have immense benefits, especially for real-world applications that constantly have to deal with new and unseen information. It is worth noting, however, that it takes more than small-scale tests to prove the efficiency of a new architecture. But given that the work comes out of Google, we can expect it to potentially find its way into Google’s frontier models in the future.

Does It Change Everything? Rating =

On a lighter note, you have less than 24 hours for the Weights & Biases competition on a fuller 20 question set on Simple, to see if any of you can prompt your way to 20/20. So far, the record is 18/20, though I will break down the reasons for this sometime soon. Grateful for Weights supporting this, and yes, it’s all run on Weave.

To support hype-free journalism, and to get a full suite of exclusive AI Explained videos, explainers and a Discord community of hundreds of (edit: now 1000+) truly top-flight professionals w/ networking, I would love to invite you to our newly discounted $7/month Patreon tier.