Modern Solution, Ancient Problem: Measuring Military Effectiveness in the Age of Artificial Intelligence


A likely apocryphal exchange after the Vietnam War captured the problem of fighting without keeping effectiveness in mind: An American colonel tells a Vietnamese colonel, “You know, you never defeated us on the battlefield.” “That may be so,” replies the Vietnamese colonel, “but it is also irrelevant.”

“What effect?” is a fundamental, though often overlooked, question in national security: What effect did an action have on an adversary, and did it bring us closer to our goals? This question can be asked about a foot patrol to greet local leaders, a bomb dropped on an insurgent position, a cyber operation to shut down an electric grid, or the awesome display of an aircraft carrier in plain sight of an adversary. The question applies equally to senior policymakers and to junior tacticians. Ignoring it at any level can have grave consequences, not least a tendency to win battles while losing wars.

Important as the “what effect?” question is, its answer is something of a white whale—impossible to fully grasp despite our best efforts. An effect is a change in an adversary’s intentions to respect U.S. interests, and intentions exist within the mind of an adversary. Outside of active combat, when direct violence makes intentions plain, we cannot observe what others intend to do. Even if we could see into an adversary’s mind, effects play out over too much space and time to be fully perceivable. The effect of a bomb on an insurgent position, for example, is the culmination of reactions by insurgents in the bomb’s immediate vicinity, by observers who saw the bomb fall from a distance, by people who heard about the bomb falling, and so on. To answer “what effect?” we would need to understand these reactions in the moments after the bomb’s impact, as well as how those immediate reactions matured in the days and weeks after the bomb fell. Measuring all of this is too much to ask of any person or group.

Artificial intelligence (AI), as part of a broader approach committed to evidence-based effects measurement, can bring us closer to the white whale. It can measure the display of emotions, which are indicators of behavioral intentions, and it can do so across space and time. Even as we use AI to develop new tools for waging war, we can also use it to better understand the effectiveness of the tools we already have.

The challenge will be for decision-makers to actually embrace AI as a way of measuring effectiveness. In addition to technological and organizational hurdles, we also face a psychological barrier: Our innate sense of knowing. Humans can see how adversaries in their field of vision react, and they can sense the moods of others. Adopting AI as a measure of effectiveness would entail admitting that our own perceptions may be incomplete and that what machines report may be better—an admission that can feel wrong. The first step to adopting AI as a measure of effectiveness is understanding how human cognitive biases make judging effectiveness difficult. This article describes specific ways in which human intuitions of effectiveness are prone to error and explains how AI can help answer the fundamental question: “What effect?”

One Hard Question, Three Easy Substitutes

 One of the most important developments in the last half-century of psychology is the recognition that humans, when faced with questions of immense complexity, simplify those questions in order to reach a conclusion. In the language of Nobel Prize-winning psychologist Daniel Kahneman, we take hard questions and replace them with easy questions that have intuitive answers.

Instead of the hard question of “what effect?”, we tend to reach for (at least) three easy questions instead: What do I want the effect to be? What is the effect on a simplified stereotype of the enemy? And what is the effect here and now (and only here and now)? We turn to these simpler questions—heuristics, in the language of judgment and decision-making researchers—because their answers are intuitive. The answers come to mind more readily than the answer to how a number of multi-dimensional adversaries will react across a wide range of space and time. Answering the easy questions leads to systematically biased estimates of the effect in question. AI’s most important contribution in measuring effects will be to insulate us from this bias. At least, that can be its most important contribution—whether it will actually be able to depends on the questions we direct it to answer.

Easy Question #1: What Are My Goals?

When faced with complexity, we tend to substitute desire for reality, swapping “what effect?” with “what do I want the effect to be?” Or, given organizational incentives, “What does my boss want the effect to be?” The what-do-I-want question is easy to answer—we want effects to serve our goals.

This easy question results from confusing the goal of success (we want to see certain effects achieved) with the goal of accuracy (we want to know the actual effects that were achieved). This is quite common—anyone who has ever planned to accomplish more in a day than realistically feasible has swapped these two goals. It is what psychologists call the “planning fallacy.”

AI, at least “narrow AI” trained to solve a specific set of problems, is not susceptible to goal-swapping. A program that was properly trained to measure fear in an adversary’s voice, for example, would not subconsciously change its measure before a battle in order to bolster its self-confidence, or after a battle in order to send good news up the chain of command. The algorithm would simply measure fear. (Whether the measurement would be used in an unbiased fashion would depend on the human decision-maker.)

Easy Question #2: What Is the Effect on a Mental Stereotype of the Enemy?

A second substitution entails replacing the full complexity of an adversary for a simpler stereotype of him or her. “What effect?” requires measuring an action’s effects on all combatants involved, a task that leaps in complexity with each additional combatant. An easier, more intuitive task is to create a single stereotype to stand in for all combatants and measure effects on that single stereotype. The stereotype of the adversary can be less than human, which some research argues may be necessary to maintain psychological health while systematically committing violence against another group. Even decision-makers who are not prone to derogate an enemy must still rely on some level of simplification, simply because they have limited cognitive capacity to represent the full complexity of an adversary.

For reasons that may trace back to combat in our evolutionary past, humans tend to overestimate how much fear an adversary will experience while underestimating how much anger they will experience. Fear and anger lead to distinct behaviors that can turn an effect from generally good to generally bad. Fear leads individuals to perceive more risk in the world and feel less control over a situation, both good things if the goal is to convince an adversary to stop fighting. Anger has the opposite effect—it leads adversaries to sense less risk, feel more control, and seek retributive justice, all of which are bad if the goal is to stop fighting.

Using technology that already exists, algorithms can be trained to measure distinct emotions from text, images of faces, and pitch of voice. Measurement tools would ideally integrate with existing intelligence collection capabilities—platforms that capture voice communications or facial imagery could provide the basic data necessary for algorithms to estimate an adversary’s emotional response to an action. Developing such tools would entail adapting commercial technology to a military setting—for instance, commercial companies already advertise the measurement of emotion as a means of gauging the effectiveness of marketing campaigns.

Using historical data that show whether an enemy previously reacted with fear or anger, algorithms could predict when effects would lead to one emotion or another. Indeed, anger and fear could be just a start. AI could be trained to measure other factors related to effectiveness: a population’s degree of trust in its government, for example, or the esprit de corps of a fighting unit.

Easy Question #3: What Is the Effect Here and Now (and Only Here and Now)?

Finally, humans tend to simplify the hard question of effects because it plays out over too much space and time to fully process. Returning to the example of a bomb on an insurgent position, any observer of the bomb, relatives of those in the vicinity, or individuals who simply hear about it may experience a change in their intentions to respect U.S. interests. A complete answer to the “what effect?” question would have to account for anyone who comes into contact with the bomb, directly or indirectly.

Effects also play out over time. The effect of the bomb immediately after it has fallen may be different than its effect after a week has passed, when initial shock may have turned to fear or anger. Measuring the effect of even a single action over the entire space and time in which it could unfold is impractical for humans. Monitoring the effects of a long operation or an entire theater would be impossible.

Artificial intelligence could help solve this problem by using algorithms to evaluate effectiveness in a broader sense. If properly trained with the right data, algorithms could measure emotional reactions, discussed above, in the immediate vicinity of an action and in areas where news of an action could reasonably be expected to spread, and do so in the moments, days, and weeks after the action. They could collapse this immense complexity into a form that human decision-makers could readily use.

Of course, the algorithms would only be as effective as the data used to train them and, once trained, the number of data points we could collect. Specialists would have to contend with the problem of encoding human limitations into the artificial intelligence—for instance, if the algorithms were only trained with data sets that included positive outcomes from America’s perspective, the artificial intelligence would be liable to produce biased answers.

Challenges: Technological, Organizational, and Psychological

To develop AI as a way of measuring military effectiveness, the Department of Defense will have to overcome a number of barriers. First, it faces the challenge of convincing technical experts to develop AI for military applications. Google employees recently protested their company’s use of open-source AI solutions to help the department process data from unmanned aerial system feeds. The episode showed that adapting artificial intelligence to military settings will not be as simple as signing contracts with companies. If the Pentagon is to use AI to measure effects, it will need to convince ambivalent developers that military applications are morally acceptable.

A persuasion strategy aimed at those with relevant AI expertise should emphasize that better measures of effectiveness could reduce violent conflict in the long run. Measuring effects more precisely would reduce violence that was objectively ineffective (but intuitively seemed effective). In cases when violence was necessary, better measures of effectiveness would help ensure it was directed only at those whose intentions could not be changed through any other means.

Organizational barriers stem in part from the Defense Department’s size and past success. The flexibility necessary to adopt new technology tends to decrease with bureaucratic size, while the U.S. military’s past success leads to a question that handicaps change in any form: What worked in the past was successful, so why adopt new methods? The department’s own innovation board has recommended that the Pentagon take full advantage of AI, including by building a department-wide center devoted to research, experimentation, and operationalization of the technology. Holding progress back, however, is what innovation board member and former Google CEO Eric Schmidt has called an “innovation adoption problem” [emphasis in original]. According to him, the Defense Department does not lack new ideas, but rather the capacity to move beyond the status quo.

Beneath technology and organizational dynamics, however, is a more basic psychological hurdle. We can see effects with our own eyes and intuit them without effort. To adopt AI to measure effectiveness is to adopt suspicious attitudes towards our own intuition. Such self-suspicion is not easy, especially for experienced decision-makers whose value to the organization is partly defined by their ability to read a situation.


Artificial intelligence is more than a tool for developing weapons of war—it is a tool for understanding the effects of weapons, both those we currently possess and the ones we will possess in the future. It is not the first technology to be both a means of waging war and a means of understanding war’s effectiveness. Aircraft that deliver munitions can also be platforms for battle damage assessments, and satellites can be weaponized at the same time they are used for reconnaissance. AI differs, though, in that it is not just a way to collect more information about effectiveness—it is a way to make sense of all that information.

Artificial intelligence is a means of solving the ancient problem of limited cognitive capacity in an information-abundant world. That is, it can be if we are willing to let it help. If we are, AI can bring us a step closer to solving another ancient problem: It can help us observe the mind of the enemy, where effectiveness has always resided but where we have never been able to see.


Brad DeWees is a doctoral candidate at Harvard University, where he studies judgment and decision-making and international relations. He is a captain in the Air Force. The views expressed here are his alone and do not necessarily reflect those of the U.S. government or any part thereof. TwitterLinkedIn.

Image: U.S. Army