Signal to Noise
Posts
Can Google Clamp Down on OpenAI?

Can Google Clamp Down on OpenAI?

AI Explained
April 10, 2024

What’s a million hours between friends?

What? No, of course it’s not news that OpenAI trained GPT-4 and Sora on YouTube transcripts and videos. But at such scale? And making Whisper just for that purpose? It’s a ‘violation’, according to Google.

Led by Greg Brockman, an OpenAI team transcribed more than 1 million hours of YouTube videos to train GPT-4.
But YouTube has, according to The Atlantic and our analysis, over 3 billion hours of footage, and transcription accuracy is increasing all the time. Of course, ‘quality’ will vary.

So? This moment has opportunity and peril for OpenAI and the like. One outcome is that they are sued into submission. Neal Mohan, YouTube CEO: ‘[The terms of service do] not allow for things like transcripts or video bits to be downloaded … that is a clear violation.’

If they can avoid this fate though, the data indicates that a future Sora 2 or GPT-5/6 could be trained on 1000 times as much video data, with all that implies for generating more realistic video, audio and text.

Does It Change Everything? Rating = ⚁

For the full suite of exclusive videos, podcasts, and a Discord community of hundreds of truly top-flight professionals who enjoy networking (physical and virtual) and GenAI best-practice-sharing across 30+ fields, I would love to invite you to our Patreon.