How Stevens Is Using AI to Spot Falsified Voices
Faculty-developed algorithms analyze characteristics to spot voice 'spoofing' fraud
Credit cards can be duplicated. Photos can be altered. Videos can be digitally edited.
And now our voices can be copied remarkably closely, too — opening up a huge new potential area of financial and data fraud.
That's why Stevens Institute of Technology researchers are building new artificial intelligence (AI) tools to spot fake or synthesized voices before they gain access to private data and financial accounts.
"As we move more toward using voices as passwords, technologies have also rapidly developed that allow almost perfect copies of voiceprints," explains electrical and computer engineering professor Rajarathnam Chandramouli, who is developing the technology with fellow professor K.P. Subbalakshmi and doctoral candidate Zongru (Doris) Shao.
"A quick check online demonstrates that it's possible to copy anyone's voice, even the President's," adds Subbalakshmi. "And these spoofs are already good enough to fool people and even machines."
Finding the optimal mix of channels
To fight back, the team worked to build an algorithm that can authenticate voices.
Delving into a database of thousands of human and machine-created voice samples supplied during a 2015 hacking challenge, the researchers analyzed a number of different characteristics of the samples, including their temporal and spectral characteristics.
Next the team began slicing, dicing and recombining the analyses in different ways, hoping to achieve accurate results on new samples of real and falsified voices. By using a deep-learning model known as a convolutional neural network, the software learned to detect specific features that help distinguish humans from spoofs or bots.
Though they began with zero information about which voices would be real and which would not, the best-performing algorithm combination — a set of four characteristics — was able to distinguish between a human's real voice and a computer-generated, falsified voice as often as 95 percent of the time.
"Some models didn't work very well, but our ensemble deep-learning model combining several factors did work very well," notes Shao.
"Finding the right mix is very challenging. We don't have the optimal combination yet, but we are very happy with the early results," adds Subbalakshmi.
With the global market value for voice recognition technologies expected to soar to $18 billion within five years, additional business applications of the technology are likely to be developed soon, say the researchers.
"With biometrics such as voice-unlocked passcodes, there often isn't robust — or any — security on the other end," notes Chandramouli. "So this may be a powerful security for us to offer banks and financial institutions as they fight to protect against new technical innovations by bad actors."