Google Open-Source SynthID is Watermarking AI-Generated Text

Google’s SynthID will now watermark AI-generated text based on the token generated by LLM for the texts, words, and sentences.

AI Generator models have flooded the digital world with AI-generated content including videos, images, designs, text, and music. While chatbots provide all types of content, some of the tools also provide an option to humanize AI-generated multimedia. Google is approaching this issue with a different motive than others as it has open-sourced the SynthID to watermark AI-generated text.

SynthID is an AI watermarking tool of Google DeepMind which is now available to watermark the AI-generated text. Previously, the tool was only able to watermark AI-generated images, videos, and music and was available for limited people. In May, Google applied the SynthID on its Gemini app and other chatbots to have feedback on the performance of the tool.

Pushmeet Kohli, vice president of research at Google DeepMind told MIT Technology Review, “Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly.”

How does SynthID identify AI-generated text?

Google has open-sourced the tool, which has already been integrated with the Gemini chatbot. Developers and businesses can now use the tool to determine whether the text output has come from their AI-generator chatbots. Currently, only Google and the developer with access to a detector that identifies the watermark can use the tool.

SynthID works by recognizing the tokens used by LLMs in the text output. LLM is a Large Language Model which supports chatbots, it generates text with one token at a time. To generate a sequence of texts, the model predicts the next token for the text. These tokens can represent a character, word, or phrase.

LLM makes predictions based on the previous words and the probability score is assigned to each token for the next text. The whole process is repeated throughout the generated text, which allows a single sentence to contain ten or more probability scores. The final pattern of scores, which combines the model’s word choices with the adjusted probability scores, is referred to as the watermark.

The accuracy of SynthID increases with the length of the generated text as it contains a large number of probability scores. Kohli said, “While SynthID isn’t a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools.”

Even after the million prompts test, researchers have alleged that it’s easy to alter the Gemini-generated texts and fool the detector. However, it’s hard for common users to understand the right way to alternate the text or to identify the particular words that need to be changed. There can be many loops in the SynthID too but Google claims it to be the most accurate as of now.

Source link