Helping Humans and Computers Fight Together: Military Lessons from Civilian AI
This article is the second in a series on digital defense. The first article looked at how the United States should bring tech experts and the innovative ideas they develop into the Department of Defense at an accelerated rate, and why those innovations should be shared with allies. This series will conclude with an essay on how the Army’s 1941 Louisiana Maneuvers can be a model for virtual training of the armed forces today to face future conflicts, not past wars.
The U.S. Department of Defense has, understandably, taken a growing interest in artificial intelligence (AI). Military planners have run a series of war games to determine how effectively AI could replace humans in combat. Based on these early experiments, computers seem likely to eventually overtake their human operators — in one recent test, an AI system defeated its human opponent in a simulated dogfight. Some military planners have even worried that keeping humans in the loop is already dangerously slowing down decision-making in high-stakes situations.
But using AI to replace humans in combat ignores the lessons Silicon Valley has learned implementing this technology in other fields. The experience of civilian AI has demonstrated how dangerous it is to think of computers as being simply “better” or “more accurate” than individuals. Increasing the accuracy, speed, and efficiency of complex systems through automation may make failures less common, but it can also make them far more serious when they occur. As a result, successfully leveraging AI is not about replacing people with computers in the performance of discrete tasks. Rather, it requires embracing the relationship between technology and the human beings who interact with it.
How should the military go about this? We believe our experience at Rebellion Defense can offer some important insights. We are a software company that helps intelligence and national security clients apply AI to their work. In our own development process, we discourage defense partners from focusing on places where machines can take over an existing process and explore instead how the process itself could change if humans and machines work together. This leads us to three basic principles: a) build expertise, don’t eliminate it, b) find opportunities for hybrid intelligence, and c) create error budgets.
Lessons from Civilian AI
As Silicon Valley discovered, when AI is introduced into complex processes, it fundamentally alters them, changing the types of errors that occur and their ramifications.
Our first brush with the complications of using machines to eliminate human error came from automating the work that system administrators do setting up, upgrading, and configuring servers. Software engineers built automated systems that could read plain text files with configuration instructions and reconfigure servers as needed on the fly. Doing so greatly decreased the number of errors triggered by mistyped commands or by forgetting to upgrade one or two servers on a list. Companies that implemented this automated approach decreased mistakes and increased efficiency. Eliminating human error also allowed systems to grow, becoming faster and more complex. But this growth came at a price. In automated systems, the errors that do happen can cascade through multiple subsystems almost instantaneously, often taking down broad swaths of seemingly unrelated operations. In November 2020, such a bug disabled a large chunk of the internet for several hours. System failures caused by unexpected behavior from error-reducing technology trigger catastrophic outages a few times a year at the top technology firms in the world.
The unexpected consequences of AI have taken more tangible forms as well. Amazon has more than 110 major distribution centers across the country. They are heavily automated to improve both efficiency and safety. But to the company’s surprise, cutting-edge robots and AI were found to be increasing workplace injuries and accidents in the online seller’s warehouses. Some robotized warehouses were seeing injury rates five times the industry average.
Again, decreasing errors allowed systems to move faster and become more complex. Amazon’s robots were increasing human injury because they scaled the pace of warehouse activity past what the human body could sustain. There were fewer accidents as a result of people moving things around the warehouse space, but more injuries related to repetitive stress and exhaustion as workers stayed at their stations processing packages at increasing speed. In developing a robot workforce, Amazon tried to swap out humans for machines without approaching the system holistically or considering how humans would adapt to the automation. Truly safe systems take into account the interaction between humans and machines, rather than trying to replace one with the other.
Because of cases like this, Google famously referred to its machine learning and AI as a “high interest credit card”: The benefits were very real, but preventing unexpected or harmful behavior in automated systems was more difficult and costly in the long term than Google anticipated. When systems that previously did not talk to one another start to share data through AI models, the models retrain themselves on data produced by other models. Unintended consequences are guaranteed. As a result, Google has sought to manage those consequences by closely embedding the employees who operate its systems with the researchers who develop its AI.
Leveraging AI for Defense
Drawing on these insights, how should the Department of Defense go about applying AI to military situations? We advocate three basic principles.
Build Expertise, Don’t Eliminate It
Both the United States and Russia have stories of commanders preventing the Cold War from accidentally slipping into a hot nuclear war when alerts from early detection systems were ignored or overruled by human operators whose experience told them the machine’s conclusion made no sense.
Advocates for AI ethics emphasize the importance of having a human in the loop — that is, human supervision of AI outcomes. But this supervision will not help if the human operator does not have enough experience or expertise to determine what the correct outcome is in the first place. Safety researchers refer to the “ironies of automation”: The process of automating tasks makes the human operators whom systems rely on for good judgment less knowledgeable and experienced. The first stab at integrating AI into an existing process often leaves the most difficult and nuanced analysis to the human operators while delegating more basic analysis to the machine. Conventional thinking assumes that without the burden of the simple tasks, the human will be more efficient. In reality, though, accuracy on those hard tasks comes from experience doing the simpler ones. In the industry this is sometimes referred to as a “moral crumple zone.” The human operator ends up being the scapegoat for the machine’s error since he has a supervisory role, yet the machine’s growing complexity makes it impossible for him to understand what the correct outcome should be.
Rather than optimizing systems to go faster or reduce errors, we pay attention to how people build expertise in systems and configure machines to increase their intake and retention of on-the-job insight. For example, intelligence, surveillance, and reconnaissance operations typically comprise a hierarchy of analysts, from low-level line analysts who focus on annotation and data entry to more senior level analysts who construct and refine intelligence narratives based on that data. Those with a high-level view of intelligence, surveillance, and reconnaissance activity assume the best way to apply AI to improve this process is to have AI replace the low-level analysts, or at least filter the incoming data for relevance. After all, humans get tired, miss things, and make mistakes that AI could catch.
But when we interviewed warfighters working in intelligence, surveillance, and reconnaissance, they were quick to point out that every level of analysis is done by analysts who advanced from the level below. When low-level analysts move on to jobs that machines cannot do, their expert judgment has been honed by hours and hours of boring data entry work. If we replace the low-level analysts with AI, this expertise is lost. As a result, the whole system becomes weaker and the mistakes it makes have more dangerous and far-reaching consequences.
Find Opportunities for Hybrid Intelligence
One of the benefits of AI is that the types of mistakes humans and machines make are fundamentally different from one another. So-called hybrid intelligence aims to combine the complementary strengths of human and AI to create better outcomes.
At the heart of hybrid intelligence lies Moravec’s paradox, which tells us that pattern matching is difficult and resource-intensive for computers but cheap and easy for people, whereas calculations are difficult and resource-intensive for people but cheap and easy for computers. For decades, computers have been able to take metadata from a photo and cross-reference it with geolocation boundary data to determine which country it was taken in. But only the most modern computers could begin to figure out what it is a photo of. Rather than harp on the flaws of humans and machines by considering them separately, we should appreciate how they can best work together as a team.
Doing this starts in the product development phase, when designers search for opportunities to create hybrid intelligence by constructing a process map that classifies tasks as either pattern matching or computational. Consider the task of maintaining a fleet of armored vehicles. AI can usefully predict and prevent equipment failures by running calculations on sensor feeds from thousands of moving parts. AI can even predict when certain parts should be reordered based on historical data. But if something breaks unexpectedly and a spare is not on hand, AI cannot determine if a similar part from a different manufacturer it has never seen before is an appropriate substitute. Sorting important details from superficial ones, forming a hypothesis and testing it, and adapting on the fly to changing conditions are all hallmarks of human intelligence. The most effective systems combine reliable predictions based on computer calculations and flexible human responses when those predictions fail.
Create Error Budgets
AI does not eliminate the possibility of failure. Instead, it changes the types of errors and their ramifications. The biggest technology companies build in a margin of error when designing systems, which they colloquially refer to as an “error budget.” In the military context, an error budget would essentially be a measure of how many errors can occur within a specific period of time before the ability to successfully complete a mission is in danger.
In the military domain, setting honest and realistic expectations about the tolerance for error in a process is difficult. The level of acceptable risk is different when you are maneuvering on the battlefield rather than delivering packages in suburbia. Tolerance for error is often a function of scale, which can be hard to conceptualize. For example, a 90 percent accuracy rate may sound remarkably accurate, but it means that for every 100 cases there will be 10 mistakes. In a missile attack, 10 mistakes out of 100 launches would be disastrous. AI should not be applied in places where the tolerance for risk is lower than the current accuracy of the technology.
Paradoxically, thinking in terms of error budgets can also help avoid the risk of eliminating too many errors. When technology is unlikely to fail it encourages the systems around it to cut corners and relax controls designed to mitigate failure. This increases the impact of failure when it does happen. A computer’s clock, for example, is one of the most reliable, error-resistant programs a machine runs. Programmers do not program in backups for clocks, nor do they program in controls to check for clock failure. Why would they? Clocks fail only once or twice a decade. But they do fail. Maybe a bug is introduced into the GPS system the clock uses as a reference, or maybe the computer counts to a number larger than the amount of memory it has set aside for the clock. When this happens, all those systems that were counting on their clocks to be infallible are suddenly vulnerable. For this reason, something as simple as clock failure can damage everything from power grids to medical equipment to bank accounts.
The danger from systems that are too reliable is so great that Google famously started intentionally triggering errors in its own internal services. The company realized that many of its most difficult and costly outages were caused by design teams assuming those services were perfect and not considering what might happen if they failed. Rather than eliminating error for the sake of eliminating error, technical leaders should think about the potential impact of specific errors. This can help them think strategically about where error reduction will create the most benefit and where it could prove counterproductive.
Conclusion
As it continues to mature, AI can do a great deal to improve the efficiency of the U.S. armed forces. But it can also introduce new problems. While computers often seem smarter than we are, they know only what they are programmed to know. They cannot adapt on the fly, they cannot think critically about their surroundings, and they cannot devise strategy. This makes them more vulnerable to making mistakes in situations with high levels of ambiguity. As the Pentagon increases its dependence on AI, the most successful strategy will involve designing systems that play to the strengths of both man and machine, rather than effectively “demoting” humans by replacing them with seemingly “more reliable” AI.
Marianne Bellotti is a software engineer and author of the book Kill It With Fire: Manage Aging Computer Systems (and Future Proof Modern Ones). She runs engineering teams at Rebellion Defense and regularly lectures on system safety.