Don’t Believe Your Eyes (or Ears): The Weaponization of Artificial Intelligence, Machine Learning, and Deepfakes
It’s 6:58 a.m. and it’s still dark in Vilnius. Marcus stops by the coffee shop on his way to work as a diplomat at the U.S. Embassy. As he exits, ready to cut across Pylimo Street, a man approaches him. In accented English, the man says that he’s lost and motions to his phone. Marcus looks down at the phone and sees a video of a man embracing a woman for a kiss. He’s the man in the video. But the woman is not his wife.
He looks back at the stranger, who smirks. The man switches to his map, asking Marcus if he knows how to get to Rentida Apartments, the very same that Marcus shares with his wife Abby. Marcus gets it — the man must work for a foreign intelligence service and he is being blackmailed. Marcus’ mind races, “How did he get this video?” He’s never seen this woman, let alone been in the room that this digital affair took place. How would Abby react if she saw it? Their relationship has been struggling since they moved overseas. Marcus thinks, “How could this all be possible and happening to me?”
For thousands of years, humanity has relied on five senses to determine threats to their wellbeing. Our ancestors used their keen senses of sight and hearing to assess risk or identify a suitable meal. With the advent of new technologies, however, this reliability may be slipping away. Many algorithms are already in the wild, available freely on open source repositories like Github or Bitbucket. Programs, such as FaceSwap, are ready to be weaponized against individual citizens, corporations, and nations.
While much of the dialogue focuses on the political impact and fallout, there is a very real threat to national security and human life in the application of artificial intelligence, machine learning, and deepfakes. As such, nations should rethink how they counter new vulnerabilities with special capabilities that are spread throughout defense, intelligence, academia, and industry.
Deepfakes and Neural Networks
One technique that has gained attention is “deepfakes,” a mix of the words “deep learning” and “fake media.” Deepfakes came to prominence when a Reddit user utilized a Generative Adversarial Network (GANs) to create pornography that swapped the actress’ face with a well-known female celebrity. News outlets picked up on the story. Meanwhile, the House Intelligence Committee held its first hearing on the matter in June 2019, inviting experts from law, security, academia, and industry.
The re-emergence of neural networks as a popular approach within computer and data science are improving machine-learning capabilities. Neural networks are computer algorithms that are meant to mimic the processes of the human brain. Interconnected layers of nodes, which represent neurons, conduct mathematical computations on inputs that pass their result on to the next layer. Each node in the next layer receives results for each node in the previous layer until a prediction is made. The decision is then compared with the actual outcome, and each connection between the nodes is adjusted to carry more or less weight. The process is repeated until the difference between predictions and outcomes is minimized. These implementations allow experts, and in some cases mere enthusiasts, to manipulate images, video, audio, and text in such a way that even the keenest observers can be deceived. This capability could be used to interfere in an election, sow political chaos, or frustrate military operations, making this a national security issue.
GANs feature two competing neural networks. One network, called a “generator,” makes a product based on a real version. It then shows that product to the other network, called a “discriminator,” that determines if this image is real or fake. As the images are created, then judged, the facsimile’s realism improves proportionate to the discriminator’s decreasing ability to determine what is real or not. The process is similar to that of an art forger and art appraiser striving to pass off a forged work as an original, to gain the monetary benefit.
Computer vision techniques, like focal point identification, can be coupled with GANs to track a specific object in an image or video. Using this followed by facial landmark tracking, or identifying specific features of a face, allows the program to accurately swap the target face. Similar technologies exist in the realm of augmented reality, or virtual items being displayed in the real world in real time, on sites like Instagram, and on SnapChat, in the form of face swapping and filters. Face swapping is not new, but new technology makes it much cheaper. Implanting a young Carrie Fisher into the Disney film Rogue One, for example, cost millions of dollars. However, a close replication was created in a few hours at no cost using off the shelf, deepfake technologies.
Machine learning and deepfake capabilities in the hands of other great powers and terrorist groups could threaten U.S. national interests. Experts have already written extensively about these threats in professional journals, while traditional news media coverage has been non-existent. Foreign intelligence services have long used blackmail and extortion to gain access to secure facilities by recruiting human sources. As described in the short story that opened this article, deepfakes could make it that much easier.
In May 2019, Samsung’s Moscow Artificial Intelligence Laboratory released a GAN that required just one image to create a moving image of an individual based off of the movements of a source video. Previous methods required hundreds of thousands of images, usually from individual frames of video, to train a model. This model requires no more than 16 images — with more angles, the more seamlessly the model can replace the target.
This algorithm uses facial landmark features, identifying eye, nose, mouth, and jawline, for a meta-training phase of the model. Once these movements are understood by the network, it can overlay new images in a realistic manner. The algorithm only requires images of a face from a few angles to add to the realism, something easily acquired through social media or surveillance of the targeted individual. Then the victim is at the mercy of the fake’s creator. They can use the technology to fabricate an affair or acts of homosexuality in a country where this could lead to imprisonment or death. Dubious claims could be made by governments as pretenses for arrest with deepfakes as evidence.
Non-state actors could use artificial intelligence as a pretense to turn people against the United States or its allies. Some have speculated about the use of a deepfake to create an international crisis between nuclear-armed states. However, it’s more likely that deepfakes could be used in a scenario like 2012 Benghazi consulate attack in which citizens protested a video satirizing the Prophet Mohammad, overwhelming a small U.S. consulate. The protest served as a pretense to a terrorist attack that left four Americans, include the U.S. ambassador, dead. Local citizens could be mobilized or recruited by extremist groups through the circulation of a fabricated video for a comparable attack. A variation of this scenario could be utilized through new forms of propaganda videos designed to convince western audiences of the threat posed by a terrorist organization or separatist group.
Along with the political implications, most news articles focus solely on video deepfakes. GANs have been used for other forms of media in convincing ways. Still images are the next most prominent, with NVIDIA Corporation releasing two algorithms, ProgressiveGAN and StyleGAN, in the last year. ProgressiveGAN utilizes a methodology that looks at images through resized revisioning, starting at 4×4 pixel images incrementing to the end resolution, in order to learn how the image is structured at each level to identify key features and attributes for replication.
Intrepid malign actors could utilize such technology to create forgeries of administrative documents such as visas and passports in mere minutes with relative novice forgers. Europe has already experienced what such forged documents can do. Both the Charlie Hebdo attack and the Bataclan theatre attack had the perpetrators gain supplies and entry into the country using forged documents. Lowering the bar to anyone with a laptop would widen the ability of these groups to move from country to country, allowing for greater success of their attacks through decentralization.
In the case of generating synthetic portraits, StyleGAN allows for a greater amount of control. Utilizing a computer vision technique known as style transfer, StyleGAN can transfer characteristics of one individual to another, to make a completely unique, synthetic image. As built, it can change high level attributes (pose or identity of a person) to low level ones (hair length and color, freckles, etc.) in its generation process.
Knowing how the Internet Research Agency interfered with the 2016 U.S. presidential election and the Brexit vote in the United Kingdom through the use of social media manipulation, weaponizing machine learning is the next logical step. StyleGAN would be highly useful in creating more authentic online personas. Online social media campaigns rely on specifically programmed “bots” to push narratives into the forefront to be picked up by individuals or legitimate media. These are typically barebones in their profile content, using one or two images, or default profiles. Through StyleGAN, robust profiles can be created using synthetically generated images, which are tweaked to fit the pose or characteristics of a real person. This may already be happening. These images adds to the believability there is a genuine person behind a comment on Twitter, Reddit, or Facebook, allowing the message to propagate.
In the same vein of social media influence, GANs have been applied to text generation. Russia deployed thousands of Twitter bots since 2014 to conduct influence campaigns. Researchers at Taiwan National University modified CycleGAN to create more reliable sentiment compared to typical Seq2Seq method currently employed in chat bots. This improvement, when coupled with the coherence and semantics of Seq2Seq, can create realistic chat bots, giving responses in real time that mimic those of a person. This allows for greater coverage of social media spaces and spreading of messages.
Improvements to audio synthesis, particularly of human voices, are also a cause for concern. Adobe Voco and Google Deepmind’s WaveGAN can copy an individual’s voice with a little as 40 minutes of audio, matching vocalizations of words not even spoken. In May of 2019, Dessa, a machine learning start-up, created a realistic version of comedian Joe Rogan’s voice. Russian backed separatists in Donbas, Ukraine already employ similar methods, jamming out friendly signals and overwriting them with their own. Although deepfakes haven’t been shown to have been utilized, the risk exists.
Many actors, not just nation states, could have access to these technologies. While education to inoculate societies to deepfakes is necessary, as well as having trusted sources to vet audio, video, and imagery, many scenarios stated fall outside of these protections. Nations should develop means to identify, counter, and prevent events from happening. A cat and mouse game of development, similar to anti-virus technologies in network security, can serve as a guide to the union of critically-thinking individuals and protective technologies to best serve against the threat.
As it stands, most countries are not equipped to fight weaponized deepfakes. The capabilities to combat them in the United States are spread throughout multiple agencies. The National Security Agency houses the technical knowledge to research and test models, but lacks the training for countering influence campaigns. The Department of Defense houses counter-influence and information operations capabilities, each with different authorities and tools. Technical knowhow is focused on information assurance and cyber security. The Department of Homeland Security has vast resources for counterintelligence, but does not have the same reach outside the boarders or has similar constraints regarding technical means.
Rethinking Washington’s Approach
To meet the needs of a digital age, Washington needs a revamp of how it looks at the threat. While such a shift may seem unrealistic, the scale of the challenge demands dramatic action. The government should use the creation of the Joint Special Operation Command in the 1980s as a model. At the time, the special operations community was overwhelmed by challenges from the failed Operation Eagle Claw, meant to save 52 members of embassy staff held hostage by Iran in 1980. Inability to coordinate and synchronize across disciplines and services caused a catastrophic failure, resulting in the loss of 8 servicemembers and multiple aircraft. While each component was highly competent within their own niche, the problem set itself engulfed multiple niches that no one organization holds. The same is true today with respect to deepfakes. While the State Department’s Global Engagement Center was envisioned to meet this threat, systematic issues have kept it from doing so. The Center, by its own account, consists of only 90 employees, well under what would be needed. As a first step, the government should take lessons learned from the U.S. special forces bureaucracy, and apply that to the threat of deepfakes and weaponized adversarial machine learning.
Adversaries will soon weaponize artificial intelligence, machine learning, and deepfakes to threaten American national interests. Scenarios like the fictitious one described at the beginning of this article are not far off. A full-court press — leveraging the best assets from all agencies, national laboratories, and private industry — is necessary to face the challenge. This requires a permanent task force headed by the Department of Defense, that pulls technical knowledge from the cyber, psychological operations, and information operation communities within each individual uniform service with the authority to conduct both attributed and non-attributed counter influence across the full spectrum of communications. Mandatory liaisons and engagement with the intelligence community, State Department, and national laboratories are required to ensure information is shared and exchanged so that the U.S. stays at the forefront of machine learning.
Joe Littell is an U.S. Army Officer and a graduate student in Data Science at Duke University. He is a combat veteran of Iraq and Afghanistan with expertise in information operations and machine learning. He has served in multiple positions across U.S. Army Special Operation Command and the intelligence community.
Correction: An earlier version of this article mistakenly referred to the 1980 hostage rescue operation in Iran as “Operation Iron Claw.” It was actually “Operation Eagle Claw.”