Machine Learning and Life-and-Death Decisions on the Battlefield
In 1946 the New York Times revealed one of World War II’s top secrets — “an amazing machine which applies electronic speeds for the first time to mathematical tasks hitherto too difficult and cumbersome for solution.” One of the machine’s creators offered that its purpose was to “replace, as far as possible, the human brain.” While this early version of a computer did not replace the human brain, it did usher in a new era in which, according to the historian Jill Lepore, “technological change wildly outpaced the human capacity for moral reckoning.”
That era continues with the application of machine learning to questions of command and control. The application of machine learning is in some areas already a reality — the U.S. Air Force, for example, has used it as a “working aircrew member” on a military aircraft, and the U.S. Army is using it to choose the right “shooter” for a target identified by an overhead sensor. The military is making strides toward using machine learning algorithms to direct robotic systems, analyze large sets of data, forecast threats, and shape strategy. Using algorithms in these areas and others offers awesome military opportunities — from saving person-hours in planning to outperforming human pilots in dogfights to using a “multihypothesis semantic engine” to improve our understanding of global events and trends. Yet with the opportunity of machine learning comes ethical risk — the military could surrender life-and-death choice to algorithms, and surrendering choice abdicates one’s status as a moral actor.
So far, the debate about algorithms’ role in battlefield choice has been either–or: Either algorithms should make life-and-death choices because there is no other way to keep pace on an increasingly autonomous battlefield, or humans should make life-and-death choices because there is no other way to maintain moral standing in war. This is a false dichotomy. Choice is not a unitary thing to be handed over either to algorithms or to people. At all levels of decision-making (i.e., tactical, operational, and strategic), choice is the result of a several-step process. The question is not whether algorithms or humans should make life-and-death choices, but rather which steps in the process each should be responsible for. By breaking choice into its constituent parts — and training servicemembers in decision science — the military can both increase decision speed and maintain moral standing. This article proposes how it can do both. It describes the constituent components of a choice, then discusses which of those components should be performed by machine learning algorithms and which require human input.
What Decisions Are and What It Takes To Make Them
Consider a fighter pilot hunting surface-to-air missiles. When the pilot attacks, she is determining that her choice, relative to other possibilities before her, maximizes expected net benefit, or utility. She may not consciously process the decision in these terms and may not make the calculation perfectly, but she is nonetheless determining which decision optimizes expected costs and benefits. To be clear, the example of the fighter pilot is not meant to bound the discussion. The basic conceptual process is the same whether the decision-makers are trigger-pullers on the front lines or commanders in distant operations centers. The scope and particulars of a decision change at higher levels of responsibility, of course, from risking one unit to many, or risking one bystander’s life to risking hundreds. Regardless of where the decision-maker sits — or rather where the authority to choose to employ force lawfully resides — choice requires the same four fundamental steps.
The first step is to list the alternatives available to the decision-maker. The fighter pilot, again just for example, might have two alternatives: attack the missile system from a relatively safer long-range approach, or attack from closer range with more risk but a higher probability of a successful attack. The second step is to take each of these alternatives and define the relevant possible results. In this case, the pilot’s relevant outcomes might include killing the missile while surviving, killing the missile without surviving, failing to kill the system but surviving, and, lastly, failing to kill the missile while also failing to survive.
The third step is to make a conditional probability estimate, or an estimate of the likelihood of each result assuming a given alternative. If the pilot goes in close, what is the probability that she kills the missile and survives? What is the same probability for the attack from long range? And so on for each outcome of each alternative.
So far the pilot has determined what she can do, what may happen as a result, and how likely each result is. She now needs to say how much she values each result. To do this she needs to identify how much she cares about each dimension of value at play in the choice, which in highly simplified terms are the benefit to mission that comes from killing the missile, and the cost that comes from sacrificing her life, the lives of targeted combatants, and the lives of bystanders. It is not enough to say that killing the missile is beneficial and sacrificing life is costly. She needs to put benefit and cost into a single common metric, sometimes called a utility, so that the value of one can be directly compared to the value of the other. This relative comparison is known as a value trade-off, the fourth step in the process. Whether the decision-maker is on the tactical edge or making high-level decisions, the trade-off takes the same basic form: The decision-maker weighs the value of attaining a military objective against the cost of dollars and lives (friendly, enemy, and civilian) needed to attain it. This trade-off is at once an ethical and a military judgment — it puts a price on life at the same time that it puts a price on a military objective.
Once these four steps are complete, rational choice is a matter of fairly simple math. Utilities are weighted by an outcome’s likelihood — high-likelihood outcomes get more weight and are more likely to drive the final choice.
It is important to note that, for both human and machine decision-makers, “rational” is not necessarily the same thing as “ethical” or “successful.” The rational choice process is the best way, given uncertainty, to optimize what decision-makers say they value. It is not a way of saying that one has the right values and does not guarantee a good outcome. Good decisions will still occasionally lead to bad outcomes, but this decision-making process optimizes results in the long run.
At least in the U.S. Air Force, pilots do not consciously step through expected utility calculations in the cockpit. Nor is it reasonable to assume that they should — performing the mission is challenging enough. For human decision-makers, explicitly working through the steps of expected utility calculations is impractical, at least on a battlefield. It’s a different story, however, with machines. If the military wants to use algorithms to achieve decision speed in battle, then it needs to make the components of a decision computationally tractable — that is, the four steps above need to reduce to numbers. The question becomes whether it is possible to provide the numbers in such a way that combines the speed that machines can bring with the ethical judgment that only humans can provide.
Where Algorithms Are Better and Where Human Judgment Is Necessary
Computer and data science have a long way to go to exercise the power of machine learning and data representation assumed here. The Department of Defense should continue to invest heavily in the research and development of modeling and simulation capabilities. However, as it does that, we propose that algorithms list the alternatives, define the relevant possible results, and give conditional probability estimates (the first three steps of rational decision-making), with occasional human inputs. The fourth step of determining value should remain the exclusive domain of human judgment.
Machines should generate alternatives and outcomes because they are best suited for the complexity and rule-based processing that those steps require. In the simplified example above there were only two possible alternatives (attack from close or far) with four possible outcomes (kill the missile and survive, kill the missile and don’t survive, don’t kill the missile and survive, and don’t kill the missile and don’t survive). The reality of future combat will, of course, be far more complicated. Machines will be better suited for handling this complexity, exploring numerous solutions, and illuminating options that warfighters may not have considered. This is not to suggest, though, that humans will play no role in these steps. Machines will need to make assumptions and pick starting points when generating alternatives and outcomes, and it is here that human creativity and imagination can help add value.
Machines are hands-down better suited for the third step — estimating the probabilities of different outcomes. Human judgments of probability tend to rely on heuristics, such as how available examples are in memory, rather than more accurate indicators like relevant base rates, or how often a given event has historically occurred. People are even worse when it comes to understanding probabilities for a chain of events. Even a relatively simple combination of two conditional probabilities is beyond the reach of most people. There may be openings for human input when unrepresentative training data encodes bias into the resulting algorithms, something humans are better equipped to recognize and correct. But even then, the departures should be marginal, rather than the complete abandonment of algorithmic estimates in favor of intuition. Probability, like long division, is an arena best left to machines.
While machines take the lead with occasional human input in steps one through three, the opposite is true for the fourth step of making value trade-offs. This is because value trade-offs capture both ethical and military complexity, as many commanders already know. Even with perfect information (e.g., the mission will succeed but it will cost the pilot’s life) commanders can still find themselves torn over which decision to make. Indeed, whether and how one should make such trade-offs is the essence of ethical theories like deontology or consequentialism. And prioritization of which military objectives will most efficiently lead to success (however defined) is an always-contentious and critical part of military planning.
As long as commanders and operators remain responsible for trade-offs, they can maintain control and responsibility for the ethicality of the decision even as they become less involved in the other components of the decision process. Of note, this control and responsibility can be built into the utility function in advance, allowing systems to execute at machine speed when necessary.
A Way Forward
Incorporating machine learning and AI into military decision-making processes will be far from easy, but it is possible and a military necessity. China and Russia are using machine learning to speed their own decision-making, and unless the United States keeps pace it risks finding itself at a serious disadvantage on future battlefields.
The military can ensure the success of machine-aided choice by ensuring that the appropriate division of labor between human and machines is well understood by both decision-makers and technology developers.
The military should begin by expanding developmental education programs so that they rigorously and repeatedly cover decision science, something the Air Force has started to do in its Pinnacle sessions, its executive education program for two- and three-star generals. Military decision-makers should learn the steps outlined above, and also learn to recognize and control for inherent biases, which can shape a decision as long as there is room for human input. Decades of decision science research have shown that intuitive decision-making is replete with systematic biases like overconfidence, irrational attention to sunk costs, and changes in risk preference based merely on how a choice is framed. These biases are not restricted just to people. Algorithms can show them as well when training data reflects biases typical of people. Even when algorithms and people split responsibility for decisions, good decision-making requires awareness of and a willingness to combat the influence of bias.
The military should also require technology developers to address ethics and accountability. Developers should be able to show that algorithmically generated lists of alternatives, results, and probability estimates are not biased in such a way as to favor wanton destruction. Further, any system addressing targeting, or the pairing of military objectives with possible means of affecting those objectives, should be able to demonstrate a clear line of accountability to a decision-maker responsible for the use of force. One means of doing so is to design machine learning-enabled systems around the decision-making model outlined in this article, which maintains accountability of human decision-makers through their enumerated values. To achieve this, commanders should insist on retaining the ability to tailor value inputs. Unless input opportunities are intuitive, commanders and troops will revert to simpler, combat-tested tools with which they are more comfortable — the same old radios or weapons or, for decision purposes, slide decks. Developers can help make probability estimates more intuitive by providing them in visual form. Likewise, they can make value trade-offs more intuitive by presenting different hypothetical (but realistic) choices to assist decision-makers in refining their value judgements.
The unenviable task of commanders is to imagine a number of potential outcomes given their particular context and assign a numerical score or “utility” such that meaningful comparisons can be made between them. For example, a commander might place a value of 1,000 points on the destruction of an enemy aircraft carrier and -500 points on the loss of a fighter jet. If this is an accurate reflection of the commander’s values, she should be indifferent between an attack with no fighter losses and one enemy carrier destroyed and one that destroys two carriers but costs her two fighters. Both are valued equally at 1,000 points. If the commander strongly prefers one outcome over the other, then the points should be adjusted to better reflect her actual values or else an algorithm using that point system will make choices inconsistent with the commander’s values. This is just one example of how to elicit trade-offs, but the key point is that the trade-offs need to be given in precise terms.
Finally, the military should pay special attention to helping decision-makers become proficient in their roles as appraisers of value, particularly with respect to decisions focused on whose life to risk, when, and for what objective. In the command-and-control paradigm of the future, decision-makers will likely be required to document such trade-offs in explicit forms so machines can understand them (e.g., “I recognize there is a 12 percent chance that you won’t survive this mission, but I judge the value of the target to be worth the risk”).
If decision-makers at the tactical, operational, or strategic levels are not aware of or are unwilling to pay these ethical costs, then the construct of machine-aided choice will collapse. It will either collapse because machines cannot assist human choice without explicit trade-offs, or because decision-makers and their institutions will be ethically compromised by allowing machines to obscure the tradeoffs implied by their value models. Neither are acceptable outcomes. Rather, as an institution, the military should embrace the requisite transparency that comes with the responsibility to make enumerated judgements about life and death. Paradoxically, documenting risk tolerance and value assignment may serve to increase subordinate autonomy during conflict. A major advantage of formally modeling a decision-maker’s value trade-offs is that it allows subordinates — and potentially even autonomous machines — to take action in the absence of the decision-maker. This machine-aided decision process enables decentralized execution at scale that reflects the leader’s values better than even the most carefully crafted rules of engagement or commander’s intent. As long as trade-offs can be tied back to a decision-maker, then ethical responsibility lies with that decision-maker.
Keeping Values Preeminent
The Electronic Numerical Integrator and Computer, now an artifact of history, was the “top secret” that the New York Times revealed in 1946. Though important as a machine in its own right, the computer’s true significance lay in its symbolism. It represented the capacity for technology to sprint ahead of decision-makers, and occasionally pull them where they did not want to go.
The military should race ahead with investment in machine learning, but with a keen eye on the primacy of commander values. If the U.S. military wishes to keep pace with China and Russia on this issue, it cannot afford to delay in developing machines designed to execute the complicated but unobjectionable components of decision-making — identifying alternatives, outcomes, and probabilities. Likewise, if it wishes to maintain its moral standing in this algorithmic arms race, it should ensure that value trade-offs remain the responsibility of commanders. The U.S. military’s professional development education should also begin training decision-makers on how to most effectively maintain accountability for the straightforward but vexing components of value judgements in conflict.
We stand encouraged by the continued debate and hard discussions on how to best leverage the incredible advancement in AI, machine learning, computer vision, and like technologies to unleash the military’s most valuable weapon system, the men and women who serve in uniform. The military should take steps now to ensure that those people — and their values — remain the key players in warfare.
Brad DeWees is a major in the U.S. Air Force and a tactical air control party officer. He is currently the deputy chief of staff for 9th Air Force (Air Forces Central). An alumnus of the Air Force Chief of Staff’s Strategic Ph.D. program, he holds a Ph.D. in decision science from Harvard University. LinkedIn.
Chris “FIAT” Umphres is a major in the U.S. Air Force and an F-35A pilot. An alumnus of the Air Force Chief of Staff’s Strategic Ph.D. program, he holds a Ph.D. in decision science from Harvard University and a Masters in management science and engineering from Stanford University. LinkedIn.
Maddy Tung is a second lieutenant in the U.S. Air Force and an information operations officer. A Rhodes Scholar, she is completing dual degrees at the University of Oxford. She recently completed an M.Sc. in computer science and began the M.Sc. in social science of the internet. LinkedIn.
The views expressed here are the authors’ alone and do not necessarily reflect those of the U.S. government or any part thereof.
Image: U.S. Air Force (Photo by Staff Sgt. Sean Carnes)