How Will AI Change Cyber Operations?

Jenny Jun

April 30, 2024

Commentary

The U.S. government somehow seems to be both optimistic and pessimistic about the impact of AI on cyber operations. On one hand, officials say AI will give the edge to cyber defense. For example, last year Army Cyber Command’s chief technology officer said, “Right now, the old adage is the advantage goes to the attacker. Today, I think with AI and machine learning, it starts to shift that paradigm to giving an advantage back over to the defender. It’s going to make it much harder for the offensive side.” On the other hand, the White House’s AI Executive Order is studded with cautionary language on AI’s potential to enable powerful offensive cyber operations. How can this be?

The rapid pace of recent advancements in AI is likely to significantly change the landscape of cyber operations, creating both opportunities as well as risks for cybersecurity. At the very least, both attackers and defenders are already discovering new AI-enabled tools, techniques, and procedures to enhance each of their campaigns. We can also expect the attack surface itself to change because AI-assisted coding will sometimes produce insecure code. AI systems and applications developed on top of them will also become subject to cyber attack. All of these changes complicate the calculus.

In navigating the impact of AI on cyber security, the question has been too often framed as an “offense versus defense balance” determination through the lens of international politics, but in reality the answer is much more complex. Indeed, some argue that the premise of trying to apply offense versus defense balance theory to the cyber domain is a fraught exercise in the first place. Instead, AI will most likely change the distribution of what targets in cyberspace are exploitable. Rather than coming up with a laundry list of where AI could enhance tasks, the focus should instead be on tasks where actors are likely to apply AI.

The exercise is also a policy problem as much as it is a technical problem. Many AI-enhanced techniques are a “double-edged sword” that can aid both the offense and the defense, depending on a variety of geopolitical and economic incentives that affect how individuals, companies, and governments incorporate AI and the preexisting constraints they face. The impact of AI on cyber is mediated through such variables, altering the distribution of cyber threats and offering larger marginal benefits to some actors. For policymakers and practitioners, the focus should be on identifying especially vulnerable targets in specific situations, rather than trying to conduct a net assessment as to whether AI favors the cyber offense or defense writ large. This would help policymakers better incorporate the implications of AI into the next U.S. national cyber strategy.

Become a Member

Sharper Swords, Tougher Shields

The U.S. government is exploring how AI can be used to augment its cyber capabilities, how AI can be used to bolster cyber defenses, and how to best secure increasingly capable AI systems. To be fair, AI has been used in cyber defense for years in areas such as anomaly detection. But more recent developments in generative AI, especially large language models, have elicited new high-level policy attention on the linkages between cyber and AI. The next few years will be an important period in how the U.S. government frames the role of AI in cyber operations.

As an illustrative example, consider the role of AI in vulnerability discovery. This features in the White House AI Executive Order as a task for the Department of Defense and the Department of Homeland Security to pilot, and is also likely to be an important element of the current Defense Advanced Research Projects Agency’s AI Cyber Challenge. One of the ways to discover vulnerabilities in code is through fuzzing, a technique that passes random or mutated inputs to a program to find any unexpected behavior such as buffer overflows. Fuzzing has been used for years by security researchers and threat actors to discover vulnerabilities. One of the challenges, however, was that fuzzing was not very scalable. Key aspects of the fuzzing process have often been manual and don’t explore the entire code base.

However, the recent increase in performance of large language models has the potential to partially mitigate this problem. By using large language models to generate valid inputs at scale, the time and resources needed to fuzz a project could be dramatically reduced, and perhaps even automated. Entire code repositories, rather than select portions, could be fuzzed easily. Other ongoing research uses large language models to create fuzzers that support multiple programming languages at the same time, further increasing code coverage. With more research, large language models have the potential to scale up fuzzing techniques and make the discovery of certain types of vulnerabilities more comprehensive.

Whether this potential leap in vulnerability discovery ultimately benefits cyber offense or defense depends on how fast the discovered vulnerabilities could be exploited versus patched. And that largely depends on preexisting factors beyond of improvements in AI. In this sense, trying to answer the question of whether some AI-enabled breakthrough benefits cyber offense or defense is reductive, as the answer is much more situational.

For instance, countries have different national-level regulations for stockpiling versus disclosing discovered vulnerabilities, such as China’s Cybersecurity Threat and Vulnerability Information Sharing Platform versus America’s Vulnerabilities Equities Process. The degree to which discovered vulnerabilities will be kept close to the chest for exploitation by the government, versus disclosure to the public for patching, will vary based on factors such as a state’s concern for national security, international norms, and relationship with the private sector, leading to divergent domestic laws and regulations. This means that an overall increased rate of vulnerability discoveries may offensively benefit some state-sponsored threat actors more than others.

Software developers and security engineers can also use fuzzing at scale to discover more vulnerabilities that can then be remediated. But the degree to which remediation of such discovered vulnerabilities will occur efficiently and at scale will also vary. For instance, incorporating fuzzing, among other vulnerability testing means, as a standard practice before a product is deployed may increase the chance that vulnerabilities are discovered and patched early on by the developer. However, for vulnerabilities discovered later in the process in already deployed products with complex dependencies, patching may take months — if not years — for certain enterprises. So while AI-assisted fuzzing may contribute to secure-by-design principles in the long run as new products are deployed, in the short run attackers may be able to take advantage of the fact that some vulnerabilities are now easier to discover yet cannot be readily remediated — mostly because of organizational constraints and inertia. The degree to which defenders can thus harness the benefits of AI-enabled vulnerability discovery and patching also depends on how policymakers can influence private firms’ economic incentives to design and deploy software with security in mind rather than incentives to be first to market.

The question of how AI will transform cyber security is thus a policy problem as much as it is a technical problem. If the U.S. government assesses that the rate at which other threat actors will stockpile vulnerabilities will increase due to new AI-enabled techniques, should it respond by increasing its stockpile of vulnerabilities, or should it try to reveal more of those vulnerabilities to the public and remediate them? How should the U.S. government signal its intent, either in private or in public, to such threat actors and third-party states and private sector stakeholders? How do private sector incentives, such as the desire to be first to market at the expense of security, affect whether we will be in a world where the rate of vulnerability discovery outpaces that of remediation or vice versa, and what can policy do to shape those incentives? Answers to these policy questions require deliberation in the highest levels of policymaking and interagency coordination.

Because technological breakthroughs interact with economic and political incentives and constraints, it is far from clear as to whether scenarios such as scaling up vulnerability discovery through AI will help cyber offense or defense. Most likely, it will change the distribution of what types of targets are now more or less exploitable. This is because AI augmentation benefits certain techniques such as fuzzing, or discovers some types of vulnerabilities but not others. Furthermore, some organizations are better positioned to remediate discovered vulnerabilities fast and early on, and some types of threat groups — focused on specific missions and therefore target certain sectors — are more opportunistic than others. Whether all of that comes out to a net win for the cyber offense or defense overall is perhaps a less important question than getting at the underlying conditions that can shape this changing distribution.

The Marginal Effect and Threats

The other important question to focus on is to consider where the marginal effects of AI are greatest in augmenting various phases of a cyber operation. For instance, the U.K. National Cyber Security Centre’s assessment of the near-term impacts of AI on cyber threats highlights AI’s impact on attack processes such as initial access, lateral movement, and exfiltration with a scale ranging from no uplift to significant uplift. Thinking about AI’s impact on cyber in terms of their likely marginal effects helps to prioritize policy attention on specific threats and avoid doomsday scenarios and over-speculation.

Take the illustrative example of the role of generative AI on the market for cyber crime. One of the near-term threats in leveraging generative AI — whether by generating text, voice, or images — is to use such content for social engineering and spearphishing in the initial access phase of a cyber operation. For instance, one industry report revealed that phishing emails have increased by 1,265 percent since the release of ChatGPT. However, the marginal increase in benefit from having such generative AI tools available as opposed to before will not be consistent across threat actors and situations.

One way to measure the success of a phishing email is to measure its “click-through rate,” or whether the victim successfully clicked on a malicious link or not. Emerging studies show that while the click-through rate of AI-generated phishing emails are still slightly lower than those handwritten by trained social engineers, it takes significantly less time — five minutes as opposed to 16 hours in an IBM study — to use a large language model to produce a phishing email. This dynamic shows what I call a “quality versus efficiency tradeoff.” In the future, the click-through rate of AI-generated phishing may increase even more if they can be customized to an organization’s normal communication patterns and/or integrated into chatbot-like agents capable of iterated conversations at scale. But even now, some threat actors would be more than happy to scale up their phishing operations using such tools at the expense of a slightly lower click-through rate.

The type of threat actors that are likely to disproportionately benefit from making this tradeoff are opportunistic cyber criminals who try to cast the widest net in a “spray and pray” operation to maximize profit using a particular exploit that may or may not be patched soon. To such threat actors, the marginal benefit of scaling up initial access is great, while the downside of fewer victims falling for the email is minimal. On the other hand, those who have relatively little to gain from indiscriminate initial access — such as state-sponsored threat groups with specific missions on a limited target set and place a higher premium on staying covert — may not benefit as much from leveraging such tools. They may not even rely on phishing emails for initial access, or already have sophisticated in-house social engineers and may only leverage generative AI for auxiliary functions such as fixing grammar, etc.

This trend suggests that particular threat actors — for instance, ransomware gangs — may be able to compromise more victims than before by scaling up their phishing operations, meaning defense may need to focus on blocking subsequent stages of attack such as privilege escalation and lateral movement. It also suggests that the burden for discerning a phishing email from a normal email should no longer rest on individuals, and that investment into measures such as email authentication protocols and other methods of providing red flags on suspicious emails at scale would be necessary.

On the other hand, there may be situations where the marginal benefit of relying on AI for certain phases of an offensive operation may be small. Yes, large language models can be used to write malicious code, but hackers already routinely leverage off-the-shelf tools, purchase as-a-service subscriptions, share exploits on forums, and automate tasks without reliance on AI. Instead, the marginal negative effects of more people using large language models to write benign code in regular software development settings could be even bigger, as they often generate insecure code and pose cyber security risks to the software supply chain with commonly exploitable bugs. Here, AI’s ability to generate functional code is not necessarily directly enhancing offensive techniques, but opening more doors for attackers relying on existing techniques by changing the attack surface of the ecosystem.

Incorporating AI into U.S. Cyber Strategy

As cyber security and AI become more intertwined, U.S. national cyber strategy needs to examine what the government can do to leverage AI to gain competitive advantage in cyberspace while also addressing new cyber risks to AI systems. Unfortunately, major cyber strategy documents in 2023 were not written in time to properly interrogate these issues. While the more recent White House AI Executive Order is welcome progress, all of the initiatives related to cyber security proposed in the order, along with existing efforts such as the AI Cyber Challenge and U.S. Cyber Command’s new AI Security Center, should be part of a broader and updated national-level cyber strategy that provides the connective tissue among ends, ways, and means in the age of AI.

The first step in doing this is to escape from the “will AI favor cyber offense or the defense” dichotomy. Many technological breakthroughs enabled by AI can potentially benefit both the offense and the defense, depending on a variety of external factors. For example, while AI can scale up vulnerability discovery, it is more likely to benefit the defense if they are found during the production process and therefore can be fixed with minimal effort before deployment. If vulnerabilities are found in already-deployed products with a lot of dependencies that slow down patch adoption, the time lag benefits the offense. In another example, AI can generate functional code, but in some cases this could be used to generate either malicious code or benign, but insecure code, while in other cases this ability could be used to automatically repair code. The extent to which the latter capability matures over the former depends on how policymakers incentivize the private sector and the research community.

What is thus more important is to focus on the mediating factors and incentives that drive actors to develop, use, and apply AI in ways that favor U.S. strategic interests, rather than to conduct an overall net assessment of the technology’s impact on cyber operations or the cyber domain writ large. Doing so will also allow U.S. policymakers to prioritize focus on the most impactful and likely set of AI-enabled cyber threats, and how to best leverage AI-enabled capabilities to decrease the attack surface and respond more effectively to cyber threats.

Become a Member

Jenny Jun is a research fellow at Georgetown University’s Center for Security and Emerging Technology, where she works on the CyberAI Project. She is also a nonresident fellow at the Atlantic Council’s Cyber Statecraft Initiative.

Image: U.S. Army Acquisition Support Center

Commentary

How Will AI Change Cyber Operations?

Commentary

By, With, and Through at the Second Thomas Shoal

Commentary

What Exactly Are We Doing?

Members

The Adversarial