Meta has been moving ahead in the AI space rapidly. The Mark Zuckerberg-owned social media giant made its presence felt with the launch of its own “open-source Large Language Model” dubbed LlaMa 2 to take on the likes of OpenAI, Google, and Microsoft, and now, to take things up a notch, Meta has unveiled its very own text-to-voice-based generative AI model dubbed AudioCraft. Continue reading to know more about AudioCraft
Meta AudioCraft Unveiled
Meta’s AudioCraft generative AI model can help you generate high-quality music and audio by using simple text-based prompts. The biggest USP of AudioCraft is that it trains on RAW audio signals to deliver an authentic and realistic experience. This is similar to Google’s audio AI tool, MusicLM.
AudioCraft is largely based on three distinct AI models: MusicGen, AudioGen, and EnCodec. The purpose of MusicGen is to generate “music from text-based inputs,” using Meta-owned and licensed music samples. AudioGen, on the other hand, generates “audio from text-based inputs,” by using publically available sound effects. The EnCodec decoder is responsible for the generation of true-to-life audio outputs and as Meta says, “with fewer artifacts.“
This means that you can generate different scenes easily with individually focused elements that will appear in sync in the final output. For example, if you use the prompt “Jazz music from the 80s with a dog barking in the background,” AudioCraft will employ its MusicGen to deliver on your Jazz part while AudioGen will insert and blend the barking of the dog in the background seamlessly. And all of this will be presented to you via the advanced decoding capabilities of the EnCodec.
While you might think that the best part of AudioCraft is its generative AI capabilities, it is not. AudioCraft is also open-source. This means that researchers can view the source code of the AudioCraft model to understand this technology further and create their own datasets to help refine it. You can view AudioCraft’s source code via GitHub.
With AudioCraft you can easily generate music and sound as well as create compression and generation. This makes AudioCraft versatile since users can build on the existing code base and create better sound generators and compression algorithms. In a nutshell, you do not have to start from scratch. Your foundation will be based on the existing ceiling of the dataset.
You can get your first taste of AudioCraft in the form of MusicGen’s text-to-music generation capabilities via Hugging Face. Comment your experience down below!