59 Percent Likely Hostile

Nano

Editor’s Note: This article was submitted in response to the call for ideas issued by the co-chairs of the National Security Commission on Artificial Intelligence, Eric Schmidt and Robert Work. It addresses the fifth question (part d.) which asks what measures should the government take to ensure AI systems for national security are trusted — by the public, end users, strategic decision-makers, and/or allies.

***

Rucking with his platoon, a future reconnaissance soldier stops for a moment to flip down his chest-mounted tablet and unpack his nano drone. He launches the Artificial Intelligence (AI)-powered Squad Reconnaissance application, and taps “Perform Defensive Scan.” The pocket-sized drone whizzes out of his hand and races down the trail in search of threats. Fifteen minutes later, the soldier feels his tablet vibrate — the drone found something. The application warns him that a group of three adult males are one kilometer ahead and closing. It assigns them a 59 percent chance of being hostile based on their location, activity, and appearance. Clicking on this message, three options flicker onto his screen — investigate further, pass track to remotely piloted aircraft, or coordinate kinetic strike. 

***

Depending on the soldier — his instincts, his temperament, his experience — “59 percent hostile” will either be interpreted as an even coin toss, or considered well over 50 percent and bound to happen. The language of probability lends itself to relatively easy interpretation. Yet sometimes common sense can oversimplify. It’s tempting to distill these decisions into a binary option (e.g., the group is or is not hostile), but the outcome is more complicated. For example, the probability function in this case means that while 59 times out of 100 the group will be deemed hostile, 41 times they won’t be. Moreover, the addition of new variables increases uncertainty.

 

 

Military operators should understand — and be trained to engage with — probabilistic outcomes when working with AI-enabled technology. Without an appreciation of how to apply “common sense,” operators will lose confidence in these technologies when the model inevitably appears to get things wrong. Understanding the following three concepts will empower users to best utilize AI applications: probability, cognitive bias, and deconstructing the decisions made by software.

Probability and Artificial Intelligence

Statistical inferences are used to analyze and predict trends in AI. These inferences rely on probability distributions to understand the likelihood of an event occurring, and are foundational to the math that underlies AI. Without such inferences, AI can’t understand when a characteristic might mean a picture is showing a dog or when it is a car. Today, AI has become a game changer due to increasing amounts of available data and computing power; methods that were considered hypothetical until recently are now possible. While the debate rages on over what AI applications should and could be developed, there is a desperate need to take a step back and have a conversation about probability: what it is, how it is understood, and how statistical models can be integrated into decision making. Organizations are rushing to technology companies asking for AI powered solutions, with little knowledge of how the models are built or how to properly train their teams to use these tools.

An infantryman does not need to know how to build a neural net or even the extent of the math that underlies it. The soldier does, however, need to understand generally how the technology arrives at a solution, and how to best make a decision with this input. As Molly Kovite argues, the less opaque AI is, the better operators will be able to use and trust AI. Ultimately, service members should be trained and equipped to make the best use of AI-powered applications.

Before incorporating AI into their decision-making process, military operators will need to understand probabilistic reasoning as well as how an algorithm makes a decision. The Defense Advanced Research Projects Agency is tackling this issue with its Explainable Artificial Intelligence Program. It’s a worthy effort to make AI models more transparent and easier to understand. These advances could help combat the dangers Michal Kosinski warns about — the opaqueness of the process and the inability to differentiate between decisions  — regarding the potential issues surrounding the use of AI applications.

Engaging with end users of AI technology on the subject of probabilistic reasoning will promote more cogent intelligence and military analysis. In order to foster trust in AI systems, users must reevaluate how they look at statistics and probability, while learning to avoid common pitfalls in reading statistical outputs. Until this is rectified, users will either lose any sense of agency, divesting decision-making to models, or, when the predicted outcome does not occur, they will discount the model and lose trust in it.

What Can Be Learned from 2016 Election Polling and Cognitive Biases

The 2016 presidential election was a perfect example of how probabilities can be misread. In the run-up to the election, most polls predicted that Hillary Clinton would win. These forecasts were based on models and simulations that, if run 100 times, indicated Clinton would win perhaps 70 percent of the simulated elections. As we know, however, President Donald Trump won, and Clinton lost.

After the election, the polls — and even polling itself — were criticized. Why had Trump been dismissed so roundly, Republicans wanted to know, while Democrats wondered how their foretold victory had dissolved. In response, statistician Nate Silver published a piece discussing polling error and why models didn’t necessarily fail because they guessed “incorrectly” that the Trump campaign had a 30 percent chance of winning. Instead, Silver rightly explains that the modeling had been misunderstood and thereby discounted.

This episode should be instructive to warfighters. As with any statistical model — used in political polling or Project Maven’s computer vision to identify objects — the goal was to express the likelihood of either Clinton or Trump winning, not to say that one of those candidates would win with absolute certainty. This discounting of polling results was a function of certain cognitive biases — biases that Daniel Kahneman investigates thoughtfully in Thinking, Fast and Slow. In the case of the 2016 election, many people came in with biases about who they wanted to win and from these perspectives, they anchored data points. Additionally, numerous polls showed Clinton in the lead (decreasing the overall modeling error), but this did not somehow increase the likelihood of a Clinton win. Polling showed Trump had a consistent probability of winning, although many thought (in the case of FiveThirtyEight’s prediction) that the overall modeling of a 30 percent chance meant Trump could not win.

People can look at the same set of odds in different contexts and take away radically different conclusions. Many concluded that, based on his polling at 30 percent, Trump had no chance of winning. Given a 30 percent chance of surviving cancer, however, many people might choose to think this is a reasonable probability for survival and battle on. The key difference between these scenarios is the context and the bias that each of us brings to the table. This is something extremely important to understand with applications of statistical models for warfighters.

AI in the Military

Operators need to be educated on their biases, how to spot them, and how to analyze the probabilities they see to minimize their inherent biases. In the context of military AI, this becomes increasingly important because life and death decisions are made under the influence of adrenaline. Users must seek to understand their biases before being put in stressful, time-sensitive situations, to train their muscle memory to appropriately analyze the information being presented to them — not blindly trust or distrust a model. Warfighters do not need to be engineers or mechanics. They only need to have a thorough-enough understanding that enables them to troubleshoot issues that arise in battle. In the fog and friction of war, military professionals are trusted to operate their weapon systems and perform basic maintenance to keep them in the fight. The military educates its warfighters with the background theory necessary for them to understand how their equipment works, creating operators who trust their equipment and use it expertly.

When it comes to the average warfighter utilizing software, however, this tenet of training and trust crumbles. In the authors’ experiences across the U.S. government, a highly technical Operator’s Guide may be provided, but little time is spent teaching how the software itself actually works. It is the equivalent of putting a pilot in a foreign cockpit and explaining, “Pull back and trees get smaller. Good luck.” Additionally, if an aircraft pilot has an advanced technical question, he can talk to a seasoned maintainer — yet no such organic capability exists for software support. Since software is primarily built and maintained by contractors, little of this expertise resides in the armed services. If software doesn’t act as expected, the operator has no understanding of what’s happening “under the hood” that could otherwise empower him to troubleshoot or understand the functionality.

While a lack of in-depth understanding might be sustainable with simple software applications — ones that either work or do not — this will be unacceptable with AI-powered applications. If a software program suggests a course of action that the operator doesn’t understand or intuitively agree with, the operator will typically either surrender his agency to the computer or dismiss it out of hand. With an understanding of how the software generally works and of their own biases, the operator could instead use AI to enhance their decision-making. It may lead them to reconsider the problem they face by adding another trusted “opinion.” Viewing AI as a black box, however, will lead to distrust and inevitable disuse in stressful moments. If a warfighter roughly understands the model used to train an AI application, they may be able to recognize a factor not accounted for in the software and adjust accordingly.

Having AI subject-matter experts available to operators, as maintainers may be to pilots, will make it easier for users to learn about the theory and application of AI software and improve operating capability. Furthermore, a feedback loop must be institutionalized for operators to communicate with developers to tailor AI for military-specific use cases. The more operators know about the software they are using, the more useful feedback they can provide, whether it regards the user experience, how certain factors should be weighted in models, or what data to include in learning sets.

In order to better understand how the U.S. military will use AI, it is helpful to remember that it is already being used today in myriad applications. These uses are commonplace where the variables are tightly constrained and the outcomes predetermined, e.g., aircraft autopilot, fire control systems, and missile guidance systems. Using this type of simple “fire and forget” AI, what McLemore and Clark term “restrained AI,” operators need not understand how the AI itself works and instead can merely trust the program to execute their clearly stated intent.

Consider a future AI application where the operator makes the “leap from running plays to calling plays,” as Dave Blair and Jason Hughes contend will occur, and apply that scenario to the earlier vignette in which the drone locates a threat with a 59 percent probability of being hostile. Just by discerning and reporting humans moving nearby the platoon, the AI has already greatly benefited the reconnaissance soldier. Now, the soldier must synthesize his understanding of the local ground picture and his knowledge of what the AI means when it says a threat is “59 percent likely hostile” to move forward.

If the operator has additional information or context above the AI, he might task the drone to get a closer look to investigate — perhaps local fighters wear a particular style of clothing in this area. Working through the problem together, the AI/operator team will be able to find, fix, and eventually eliminate the threat if it does present a legitimate target. Applying AI to such a chaotic and loosely defined problem requires deep user integration, and the AI/operator team will have to hand the problem back and forth as they progress.

What Should the Defense Department Do?

Until STEM education in the United States meets the demands of the 21st century, the Department of Defense must provide technical experts to help bridge the gap between military users and developers of AI technology. That starts by supplementing warfighters’ educations about probability and the associated cognitive biases with basic classes on inferential statistics and psychology. We also propose assigning what we would call application translators to units in the Department of Defense and Intelligence Community to serve as the “maintainers” of these systems.

The application translators would be officers or non-commissioned officers that are well versed in computer science and statistics, who would debug any issues that arise and provide a bridge between users, decision-makers, and technologists. They will not necessarily be the ones building these applications from scratch, but they will need to understand the ins and outs of the applications. This will ensure that even if applications are not developed in-house, end users will have the ability to fully understand and best utilize them.

Application translators would educate their team members not just on specific applications, but also on the underlying concepts of probability and algorithms. For example, in hedge funds, portfolio managers and research analysts sit down with a performance coach during both winning and losing streaks to evaluate what biases they might be taking on and help them determine how to re-frame their investment thesis. This is the same type of feedback that could be provided by the application translators to help operators and tweak the applications being deployed.

Distinct from McLemore and Jimenez’s call to forge an elite cadre of Naval AI technologists, we argue that AI must not be treated with the same mystique as nuclear propulsion, but rather as something that all warfighters should be able to utilize. Pulling back the curtain on how AI works and on how we understand it will demystify AI, and empower all military operators to use the myriad AI systems proliferating today.

In many fields pursuing the use of cutting-edge AI, there is a growing translation problem. Decision-makers are not fully grasping what their technical teams are implementing and are pressing these teams to create overly complex solutions so their organizations appear adaptive. End users are not equipped to understand the outputs they receive, causing them to either lose agency or ignore the outputs that are not aligned with their biases. As the Department of Defense looks to embrace AI-powered technologies, it should focus not just on creating the best technology possible, but on making sure its people are equipped to utilize the technology.

Probabilities are not Necessarily Solutions

Soldiers working with AI-enabled technology should be taught to appreciate probability and identify cognitive biases. This is imperative because probability distributions are the underpinnings of AI models. Military AI applications, with their inherent risks for moral hazard, will and must retain a human in the loop. Human context and agency are integral to tactical decision-making, and AI/operator teams must work in concert to be most effective. Warfighter buy-in is of the utmost importance, and will only be achieved by a better grounding in probability and cognitive biases.

AI and its operators will inevitably make mistakes, but the military will only learn from experience if users accurately understand what AI is telling them in the first place. When our reconnaissance soldier decides how to proceed after thinking “59 percent likely hostile,” he must understand that 59 percent is a probability — not a solution.

 

 

Daniel Eichler is a U-28A Draco Pilot with deployments to Iraq, Afghanistan, and Africa. He previously worked as a technology consultant for the Department of Defense and the Intelligence Community. He graduated with honors from Georgetown University with degrees in Computer Science and Science, Technology, and International Affairs.

Ronald Thompson is a data engineer. He has developed a number of applications that leverage data analytics and statistical models for decision-making for U.S. Army Special Operations Command, political campaigns, a multi-billion-dollar hedge fund, and others. He holds a degree in Government from Georgetown University, and is pursuing a second bachelor’s in Computer Science at the University of Colorado-Boulder.

The views expressed herein are those of the authors and do not reflect official policy or positions of the U.S. Department of Defense.

Image: Marine Corps (Photo by Lance Cpl. Julien Rodarte)