- YouTube CEO said that if OpenAI used YouTube videos to train its Sora model, it would be a "clear violation" of YouTube's Terms of Service.
- Google uses some corpus of YouTube videos to train its Gemini model, but signs licensing contracts with individual creators.
- It's worth noting that OpenAI's CTO, Mira Murati, earlier refused to answer whether Sora has been trained on YouTube videos.
In the latest development, YouTube CEO, Neal Mohan has fired shots at OpenAI around Sora training, its text-to-video AI model, using YouTube videos. Speaking to Bloomberg, Mohan said it would be a “clear violation” of YouTube’s Terms of Service (ToS). This is the first time the YouTube CEO has publicly spoken on this topic.
Mohan further said that he has no information on whether OpenAI used YouTube videos to train Sora. However, if it were the case, it would go against YouTube’s policy.
It’s worth noting that, earlier, when OpenAI’s CTO, Mira Murati spoke to Joanna Stern of The Wall Street Journal, she dodged the question on whether Sora was trained on YouTube videos. She said, “I am actually not sure about that” and further refused to clarify saying that “I’m just not going to go into the details about the data that was used.”
The YouTube CEO further said that Google’s Gemini AI model itself adheres to YouTube’s policies and individual contracts with creators. AI models can scrape the channel name, the title of the video, or the creator’s name, but downloading or training on “transcripts” or “video bits” is not allowed.
Mohan further said that Google signs licensing contracts with individual creators and some corpus of YouTube videos might be used for training Google’s AI models.
OpenAI has been facing challenges around IP rights for some time now for training AI models. The New York Times sued OpenAI over copyright infringement. The Intercept also recently filed a lawsuit against OpenAI. Now, we will have to wait and see how OpenAI reacts to allegations of breaking YouTube’s rules if the company indeed trained its Sora model on YouTube videos.
What do you think about the ongoing discussion on training AI models on licensed content? Let us know in the comment section below.