By P. J. Sunday August 25 2019
Now, researchers at the Imperial College London have used AI to mask the emotional cues in users’ voices when they’re speaking to internet-connected voice assistants.
The idea is to put a ‘layer’ between the user and the cloud their data is uploaded to by automatically filtering emotional speech into ‘normal’ speech.
Human voice inflections and cues can communicate subtle feelings, from ecstasy to agony to arousal. The human voice has lot of potential value if you’re a company collecting personal data.
They recently published their paper “Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants” (link)
Our voices can reveal our confidence and stress levels, physical condition, age, gender, and personal traits. This isn’t lost on smart speaker makers, and companies such as Amazon are always working to improve the emotion-detecting abilities of AI.
An accurate emotion-detecting AI could pin down people’s ‘personal preferences, and emotional states’, said lead researcher Ranya Aloufi, ‘and may therefore significantly compromise their privacy’.
Their method for masking emotion involves collecting speech, analyzing it, and extracting emotional features from the raw signal. Next, an AI program trains on this signal and replaces the emotional indicators in speech, flattening them.
Finally, a voice synthesizer re-generates the normalized speech using the AIs outputs, which gets sent to the cloud. The researchers say that this method reduced emotional identification by 96 percent in an experiment, although speech recognition accuracy decreased, with a word error rate of 35 percent.
Understanding emotion is an important part of making a machine seem human, a longtime goal of many AI companies and futurists. Speech Emotion Recognition (SER) is a lot older than the artificially-intelligent speech recognition systems running on Alexa, Siri, or Google Home devices. But emotionality has become a serious goal for AI speech engineers in recent years.