UMBC CADVC Deepfake Gallery Exhibit

With AI tools, human voices can be faked in seconds, instantly creating opportunities for
deception, fraud, and misinformation. Scientists typically try to detect fake audio by refining AI algorithms to catch it, but adversaries can then use that technology to generate better deepfakes. Our novel approach
insights from sociolinguistics–the study of human language in society–to train listeners to
better discern fake audio and to improve the science of deepfake detection. Can you catch a deepfake? Look at each image in the exhibit, and listen to each corresponding clip online on this page, and make your best guess!

 

Clip

Check your response

What did you think?

When you are ready click on the link to read the explanation

 

 

 

 

 

 

For more information on our AI methodology and linguistic features see our Paper

Z. Khanjani, L. Davis, A. Tuz, K. Nwosu, C. Mallinson and V. P. Janeja*(Corresponding author), “Learning to Listen and Listening to Learn: Spoofed Audio Detection Through Linguistic Data Augmentation,” 2023 IEEE International Conference on Intelligence and Security Informatics (ISI), Charlotte, NC, USA, 2023, pp. 01-06, doi: 10.1109/ISI58743.2023.10297267.

Education Efforts: Read more about how we are using these expert defined features in the classroom here.

Speech Sample Citations:

LJSpeech: Ito, K., & Johnson, L. (2017). The LJ Speech Dataset. https://keithito.com/LJ-Speech-Dataset/
R1: Reimao, R., & Tzerpos, V. (2019). FoR: A Dataset for Synthetic Speech Detection. In 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 1-10. https://doi.org/10.1109/SPED.2019.8906599
R3: Reimao, R., & Tzerpos, V. (2019). FoR: A Dataset for Synthetic Speech Detection. In 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 1-10. https://doi.org/10.1109/SPED.2019.8906599
MelGan/AudioClip 1: Kumar, K., Kumar, R., De Boissiere, T., Gestin, L., Teoh, W. Z., Sotelo, J., … & Courville, A. C. (2019). Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems, 32.
Mellotron5: Valle, R., Li, J., Prenger, R., & Catanzaro, B. (2020, May). Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6189-6193). IEEE.