Subscribe now

Technology

Audio AIs are trained on data full of bias and offensive language

Seven major datasets used to train audio-generating AI models are three times more likely to use the words "man" or "men" than "woman" or "women", raising fears of bias

By Victoria Turk

11 November 2024

Microphone

Audio training data has been overlooked when it comes to assessing AI

Israel Palacio/Unsplash

Artificial intelligence models that generate audio are being trained on datasets plagued with bias, offensive language and potential copyright infringement, sparking concerns about their use.

Generative audio products, such as song generators, voice cloning tools and transcription services, are increasingly popular, but while text and image generators have been subject to much scrutiny, audio has received less attention.

To help rectify this, William Agnew at Carnegie Mellon University in Pennsylvania and his…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New Scientist events and special offers.

Sign up

To continue reading, subscribe today with our introductory offers

Popular articles

Trending New Scientist articles

Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop