Miscalibration of Trust in Human Machine Teaming

DARPA AI Robotics

A recent Pew survey found that 82 percent of Americans are more or equally wary than excited about the use of artificial intelligence (AI). This sentiment is not surprising — tales of rogue or dangerous AI abound in pop culture. Movies from 2001: A Space Odyssey to The Terminator warn of the dire consequences of trusting AI. Yet, at the same time, more people than ever before are regularly using AI-enabled devices, from recommender systems in search engines to voice assistants in their smartphones and automobiles.  

Despite this mistrust, AI is becoming increasingly ubiquitous, especially in defense. It plays a role in everything from predictive maintenance to autonomous weapons. Militaries around the globe are significantly investing in AI to gain a competitive advantage, and the United States and its allies are in a race with their adversaries for the technology. As a result, many defense leaders are concerned with ensuring these technologies are trustworthy. Given how widespread the use of AI is becoming, it is imperative that Western militaries build systems that operators can trust and rely on. 

 

 

Enhancing understanding of human trust dynamics is crucial to the effective use of AI in military operational scenarios, typically referred to in the defense domain as human-machine teaming. To achieve trust and full cooperation with AI “teammates,” militaries need to learn to ensure that human factors are considered in system design and implementation. If they do not, military AI use could be subject to the same disastrous — and deadly — errors that the private sector has experienced. To avoid this, militaries should ensure that personnel training educates operators both on the human and AI sides of human-machine teaming, that human-machine teaming operational designs actively account for the human side of the team, and that AI is implemented in a phased approach. 

Building Trust

To effectively build human-machine teams, one should first understand how humans build trust, specifically in technology and AI. AI here refers to models with the ability to learn from data, a subset called machine learning. Thus far, almost all efforts to develop trustworthy AI focus on addressing technology challenges, such as improving AI transparency and explainability. The human side of the human-machine interaction has received little attention. Dismissing the human factor, however, risks limiting the positive impacts that purely technology-focused improvements could have. 

Operators list many reasons why they do not trust AI to complete tasks for them, which is unsurprising given the generally untrustworthy cultural attitude — outlined in the Pew survey above — towards the technology. However, research shows that humans often do the opposite with new software technologies. People trust websites with their personal information and use smart devices that actively gather that information. They even engage in reckless activity in automated vehicles not recommended by the manufacturer, which can pose a risk to one’s life. 

Research shows that humans struggle to accurately calculate appropriate levels of trust in the technology they use. Humans, therefore, will not always act as expected when using AI-enabled technology — often they may put too much faith in their AI teammates. This can result in unexpected accidents or outcomes. Humans, for example, have a propensity toward automation bias, which is the tendency to favor information shared by automated systems over information shared by non-automated systems. The risk of this occurring with AI, a notorious black-box technology with frequently misunderstood capabilities, is even higher.

Humans often engage in increasingly risky behavior with new technology they believe to be safe, a phenomenon known as behavioral adaption. This is a well-documented occurrence in automobile safety research. A study conducted by University of Chicago economist Sam Peltzman found no decreased death rate from automobile accidents after the implementation of safety measures. He theorized this was because drivers, feeling safer as the result of the new regulations and safety technology, took more risks while driving than they would have before the advent of measures made to keep them safe. For example, drivers who have anti-lock braking were found to drive faster and closer behind other vehicles than those who did not. Even using adaptive cruise control, which maintains a distance from the car in front of you, leads to an increase in risk-taking behavior, such as looking at a phone while driving. While it was later determined that the correlation between increased safety countermeasures and risk-taking behavior was not necessarily as binary as Peltzman initially concluded, the theory and the concept of behavioral adaption itself have gained a renewed focus in recent years to explain risk-taking behavior in situations a diverse as American football and the COVID-19 pandemic. Any human-machine teaming should be designed with this research and knowledge in mind.

Accounting for the Human Element in Design

Any effective human-AI team should be designed to account for human behavior that could negatively affect the team’s outcomes. There has been extensive research into accidents involving AI-enabled self-driving cars, which have led some question whether human drivers can be trusted with self-driving technology. A majority of these auto crashes using driver assistance or self-driving technology have occurred as a result of Tesla’s Autopilot system in particular, leading to a recent recall. While the incidents are not exclusively a product of excessive trust in the AI-controlled vehicles, videos of these crashes indicate that this outsized trust plays a critical role. Some videos showed drivers were asleep at the wheel, while others pulled off stunts like putting a dog in the driver’s seat. 

Tesla says its autopilot program is meant to be used by drivers who are also keeping their eyes on the road. However, studies show that once the autopilot is engaged, humans tend to pay significantly less attention. There have been documented examples of deadly crashes with no one in the driver’s seat or while the human driver was looking at their cell phone. Drivers made risky decisions they would not have in a normal car because they believed the AI system was good enough to go unmonitored, despite what the company says or the myriad of examples to the contrary. A report published as part of the National Highway Traffic Safety Administration’s ongoing investigation into these accidents recommends that “important design considerations include the ways in which a driver may interact with the system or the foreseeable ranges of driver behavior, whether intended or unintended, while such a system is in operation.” 

The military should take precautions when integrating AI to avoid a similar mis-calibration of trust. One such precaution could be to monitor the performance not only of the AI, but also of the operators working with it. In the automobile industry, video monitoring to ensure drivers are paying attention while the automated driving function is engaged is an increasingly popular approach. Video monitoring may not be an appropriate measure for all military applications, but the concept of monitoring human performance should be considered in design. 

A recent Proceedings article framed the this dual monitoring in the context of military aviation training. Continuous monitoring of the “health” of the AI system is like aircraft pre-flight and in-flight system monitoring. Likewise, aircrew are continuously evaluated in their day-to-day performance. Just as aircrew are required to undergo ongoing training on all aspects of an aircraft’s employment throughout the year, so too should AI operators be continuously trained and monitored. This would not only ensure that military AI systems were working as designed and that the humans paired with those systems were also not inducing error, but also build trust in the human-machine team.

Education on Both Sides of the Trust Dynamic

Personnel should also be educated about the capabilities and limitations of both the machine and human teammates in any human-machine teaming situation. Civilian and military experts alike widely agree that a foundational pillar of effective human-machine teaming is going to be the appropriate training of military personnel. This training should include education on both the AI system’s capabilities and limitations, incorporating a feedback loop from the operator back into the AI software.  

Military aviation is deeply rooted in a culture of safety through extensive training and proficiency through repetition, and this military aviation safety culture could provide a venue for necessary AI education. Aviators learn not just to interpret the information displayed in the cockpit but also to trust that information. This is a real-life demonstration of research showing that humans will more accurately perceive risks when they are educated on how likely they are to occur.

 

 

Education specifically relating to how humans themselves establish and maintain trust through behavioral adaptation can also help operators become more self-aware of their own, potentially damaging, behavior. Road safety research and other fields have repeatedly proven that this kind of awareness training helps to mitigate negative outcomes. Humans are able to self-correct when they realize they’re engaging in undesirable behavior. In a human-machine teaming context, this would allow the operator to react to a fault or failure in that trusted system but retain the benefit of increased situational awareness. Therefore, implementing AI early in training will give future military operators confidence in AI systems, and through repetition the trust relationship will be solidified. Moreover, by having a better understanding not only of the machine’s capabilities but also its constraints will decrease the likelihood of the operator incorrectly inflating their own levels of trust in the system. 

A Phased Approach

Additionally, a phased approach should be taken when incorporating AI to better account for the human element of human-machine teaming. Often, new commercial software or technology is rushed to market to outpace the competition and ends up failing when in operation. This often costs a company more than if they had delayed rollout to fully vet the product. 

In the rush to build military AI applications for a competitive advantage, militaries risk pushing AI technology too far, too fast, to gain a perceived advantage. A civilian sector example of this is the Boeing 737 Max software flaws, which resulted in two deadly crashes. In October 2018, Lion Air Flight 610 crashed, killing all 189 people on board, after the pilots struggled to control rapid and un-commanded descents. A few months later, Ethiopian Airlines Flight 302 crashed, killing everyone on board, after pilots similarly struggled to control the aircraft. While the flight-control software that caused these crashes is not an example of true AI, these fatal mistakes are still a cautionary tale. Misplaced trust in the software at multiple levels resulted in the deaths of hundreds.

The accident investigation for both flights found that an erroneous inputs from an angle of attack sensor to the flight computer caused a cascading and catastrophic failure. These sensors measure the angle of the wing relative to airflow and give an indication of lift, the ability of the aircraft to stay in the air. In this case, the erroneous input caused the Maneuvering Characteristics Augmentation System, an automated flight control system, to put the plane into repeated dives because it thought it needed to gain lift quickly. These two crashes resulted in the grounding of the entire 737 Max fleet worldwide for 20 months, costing Boeing over $20 billion. 

This was all caused by a design decision and a resultant software change, assumed to be safe. Boeing, in a desire to stay ahead of their competition, updated a widely used aircraft, the base model 737. Moving the engine location on the wing of the 737 Max helped the plane gain fuel efficiency but significantly changed flight characteristics. These changes should have required Boeing to market it as a completely new airframe, which would mean significant training requirements for pilots to remain in compliance with the Federal Aviation Administration. This would have cost significant time and money. To avoid this, the flight-control software was programmed to make the aircraft fly like an older model 737. While flight-control software is not new, this novel use allowed Boeing to market the 737 Max as an update to an existing aircraft, not a new airframe. There were some issues noted during testing, but Boeing trusted the software due to previous flight control system reliability and pushed the Federal Aviation Administration for certification. Hidden in the software, however, was erroneous code that caused the cascading issues seen on the Ethiopian and Lion Air flights. Had Boeing not put so much trust in the software, or the regulator similarly put such trust in Boeing’s certification of the software, these incidents could have been avoided.

The military should take this as a lesson. Any AI should be phased in gradually to ensure that too much trust is not placed in the software. In other words, when implementing AI, militaries need to consider cautionary tales such as the 737 Max. Rather than rushing an AI system into operation to achieve a perceived advantage, it should be carefully implemented into training and other events before full certification to ensure operator familiarity and transparency into any potential issues with the software or system. This is currently being demonstrated by the U.S. Air Force’s 350th Spectrum Warfare Wing, which is tasked with integrating cognitive electromagnetic warfare into its existing aircraft electromagnetic warfare mission. The Air Force has described the ultimate goal of cognitive electromagnetic warfare as establishing a distributed, collaborative system which can make real-time or near-real-time adjustments to counter advanced adversary threats. The 350th, the unit tasked with developing and implementing this system, is taking a measured approach to implementation to ensure that warfighters have the capabilities they need now while also developing algorithms and processes to ensure the future success of AI in the electromagnetic warfare force. The goal is to first use machine learning to speed up the aircraft software reprogramming process, which can sometimes take up to several years. The use of machine learning and automation will significantly shorten this timeline while also familiarizing engineers and operators with the processes necessary to implementing AI in any future cognitive electromagnetic warfare system.

Conclusion

To effectively integrate AI into operations, there needs to be more effort devoted not only to optimizing software performance but also to monitoring and training human teammates. No matter how capable an AI system is, if human operators mis-calibrate their trust in the system they will be unable to effectively capitalize on AI’s technological advances, and potentially make critical errors in design or operation. In fact, one of the strongest and most repeated recommendations to come out of the Federal Aviation Administration’s Joint Investigation of the 737 Max accidents was that human behavior experts needed to play a central role in research and development, testing, and certification. Likewise, research has shown that in all automated vehicle accidents, operators did not monitor the system effectively. This means that operators need to be monitored as well. Militaries should account for the growing body of evidence that human trust in technology and software is often mis-calibrated. Through incorporating human factors into AI system design, building relevant training, and utilizing a carefully phased approach, the military can establish a culture of human-machine teaming that is free of the failures seen in the civilian sector. 

 

 

John Christianson is an active-duty U.S. Air Force colonel and current military fellow at the Center for Strategic and International Studies. He is an F-15E weapons systems officer and served as a safety officer while on an exchange tour with the U.S. Navy. He will next serve as vice commander of the 350th Spectrum Warfare Wing.

Di Cooke is a visiting fellow at the International Security Program in the Centre for Strategic and International Studies, exploring the intersection of AI and the defense domain. She has been involved in policy-relevant research and work at the intersection of technology and security across academia, government, and industry. Previous to her current role, she was seconded to the U.K. Ministry of Defence from the University of Cambridge to inform the UK Defence AI operationalization approach and ensure alignment with its AI Ethical Principles.

Courtney Stiles Herdt is an active-duty U.S. Navy commander and current military fellow at the Center for Strategic and International Studies. He is an MH-60R pilot and just finished a command tour at HSM-74 as part of the Eisenhower Carrier Strike Group. Previously, he has served in numerous squadron and staff tours, as an aviation safety and operations officer, and in various political-military posts around Europe and the western hemisphere discussing foreign military sales of equipment that utilized human-machine teaming.

The opinions expressed are those of the authors and do not represent to official position of the U.S. Air Force, U.S. Navy, or the Department of Defense.

Image: U.S. Navy photo by John F. Williams