Recently, several users of the social media platform 4chan, used “speech synthesis” and “voice cloning” service provider, ElevenLabs, to make voice deepfakes of celebrities like Emma Watson, Joe Rogan, and Ben Shapiro. These deepfake audios made racist, abusive, and violent comments.
GS III: Science and Technology
Dimensions of the Article:
- What are voice deepfakes?
- How are voice deepfakes created?
- Threats from Voice Deepfakes
- Ways to Detect Voice Deepfakes
What are voice deepfakes?
- A voice deepfake is one that closely mimics a real person’s voice.
- The voice can accurately replicate tonality, accents, cadence, and other unique characteristics of the target person.
- People use AI and robust computing power to generate such voice clones or synthetic voices.
- Sometimes it can take weeks to produce such voices, according to Speechify, a text-to-speech conversion app.
- OpenAI’s Vall-e, My Own Voice, Resemble, Descript, ReSpeecher, and iSpeech are some of the tools that can be used in voice cloning.
How are voice deepfakes created?
- To create deepfakes one needs high-end computers with powerful graphics cards, leveraging cloud computing power.
- Powerful computing hardware can accelerate the process of rendering, which can take hours, days, and even weeks, depending on the process.
- Besides specialised tools and software, generating deepfakes need training data to be fed to AI models.
- This data are often original recordings of the target person’s voice.
- AI can use this data to render an authentic-sounding voice, which can then be used to say anything.
Threats from Voice Deepfakes:
Fraud and Illegal Activities:
- Attackers are using voice deepfakes to defraud users, steal their identity, and engage in phone scams and posting fake videos on social media.
- In 2020, a manager from a bank in the UAE authorized a transfer of $35 million to someone he believed was a company director but was actually a cloned voice.
- In 2019, a UK-based energy firm CEO was directed to transfer $243,000 to a Hungarian supplier, but the voice belonged to a fraudster.
- The use of voice deepfakes in filmmaking has raised ethical concerns, such as in Morgan Neville’s documentary film on Anthony Bourdain, where the late chef’s voice was cloned to say words he never spoke.
- The gathering of clear recordings of people’s voices is becoming easier, and the voice capture technology is improving, leading to more accurate and believable deepfake voices.
- This could lead to scarier situations, as highlighted by Speechify in their blog post.
Ways to Detect Voice Deepfakes:
- Detecting voice deepfakes requires advanced technologies, software, and hardware to analyze speech patterns, background noise, and other elements.
- Cybersecurity tools have yet to create foolproof ways to detect audio deepfakes.
- Research labs use watermarks and blockchain technologies to detect deepfakes, but the technology designed to outsmart detectors is constantly evolving.
- Programs like Deeptrace help to provide protection using a combination of antivirus and spam filters that monitor incoming media and quarantine suspicious content.
- Researchers at the University of Florida developed a technique to measure acoustic and fluid dynamic differences between original human voice samples and those generated synthetically by computers.
Call Centre Mitigation:
- Call centres can take steps to reduce the threat from voice deepfakes. Callback functions can end suspicious calls and request an outbound call for direct confirmation.
- Multifactor authentication (MFA) and anti-fraud solutions can also reduce risks, such as using call metadata for ID verification, digital tone analysis, and key-press analysis for behavioural biometrics.
-Source: The Hindu