Technology

Audio AIs are trained on data full of bias and offensive language

Seven major datasets used to train audio-generating AI models are three times more likely to use the words "man" or "men" than "woman" or "women", raising fears of bias

By Victoria Turk

11 November 2024

Microphone — Audio training data has been overlooked when it comes to assessing AI
Israel Palacio/Unsplash

Artificial intelligence models that generate audio are being trained on datasets plagued with bias, offensive language and potential copyright infringement, sparking concerns about their use.

Generative audio products, such as song generators, voice cloning tools and transcription services, are increasingly popular, but while text and image generators have been subject to much scrutiny, audio has received less attention.

How does ChatGPT work and do AI-powered chatbots “think” like us?

To help rectify this, William Agnew at Carnegie Mellon University in Pennsylvania and his…