Google Open-Sources SynthID to Watermark AI-Generated Text

Google open-sources SynthID featured image
Image Courtesy: Shutterstock (Edited By – Sagnik/Beebom)
In Short
  • Google has now open-sourced SynthID Text and made it available via Google Responsible Generative AI Toolkit.
  • According to Google, SynthID doesn't compromise the quality, accuracy, creativity or speed of the text generation.
  • However, SynthID can currently detect output from Google's own AI models.

Last year, Google DeepMind announced SynthID, a watermarking technology that can watermark and detect AI-generated content. Now, Google has open-sourced SynthID Text via the Google Responsible Generative AI Toolkit for developers and businesses. Google’s open-sourced tool currently works for text only.

https://twitter.com/GoogleDeepMind/status/1849110263871529114

With the abundance of AI models at our disposal, it’s getting increasingly hard to tell what’s real and what’s not. As a result, it’s high time that advanced watermarking tools like Google’s SynthID are open-sourced and used by AI companies.

Earlier this month, we reported that Google Photos may detect AI-generated photos using a Credit tag attribute. SynthID being open-sourced could only suggest more such applications will gain the ability to detect AI-generated text and other content.

Pushmeet Kohli, the VP of Research at Google DeepMind, tells MIT Technology Review,

Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly.”

For those unaware of how SynthID works, I’ll try to make it easier for you to understand. To give you an analogy, remember those magical invisible pens where the writing could only be seen under UV light? Well, consider SynthID to be this light source that can see those invisible marks or watermarks on AI-generated images, videos, and text.

LLMs while generating texts look at the possible tokens and give each one a score. The score represents the probability of the token being chosen. During this process, SynthID adds extra information by “modulating the likelihood of tokens being generated.”

SynthID text watermarking tool SynthID
Image Courtesy: Google DeepMind

However, the biggest limitation for SynthID right now is that it can only detect content generated by Google’s proprietary AI models. Additionally, SynthID also starts to falter when someone heavily alters or rewrites AI-generated text like translating it from a different language altogether.

Soheil Feizi, an associate professor at the University of Maryland who has researched on this topic, tells MIT,

“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where LLM outputs are near deterministic, such as factual questions or code generation tasks.”

From the sound of it, we still have a long way to go before reliably detecting AI-generated content. Still, Google open-sourcing SynthID is a great step towards AI transparency.

The tool can be an absolute game-changer in standardizing the detection of AI-generated content. Most importantly, Google states that SynthID doesn’t interfere with the “quality, accuracy, creativity or speed of the text generation.”

#Tags
comment Comments 0
Leave a Reply

Loading comments...