The United States Can Only Achieve AI Dominance with Its Allies

October 9, 2020

As the United States races with China to apply artificial intelligence for military purposes, many experts worry that it may be hampered by a shift in the nature of AI. The conventional wisdom has been that, until now, American technologists could depend on elite researchers and faster computers to outperform their Chinese rivals. However, these advantages are no longer the keys to harnessing AI most effectively. Data is. Chinese AI experts believe that China’s larger population and lax privacy controls give China a durable advantage in collecting the best data sets to teach AI algorithms how to optimize their performance. Kai-Fu Lee, China’s most prominent AI researcher, has dubbed China the “Saudi Arabia of data” and argues that China’s data advantage is expanding by the day. The Center for Data Innovation, an American think tank, agrees, calculating that the Chinese population generates terabytes more information than Americans do.

In reality, determining who holds the advantage in data is far more complicated than simply counting how many bytes of information are stored in each country. As a recent Center for Security and Emerging Technology report rightly points out, the quality of data and how well it has been curated and labeled usually matter more than simply how much data one has. Even so, the analysts take for granted that China’s size will ultimately give it the advantage in commercial data, one that may let its corporations overtake their American counterparts in AI.



However, these conclusions overlook the primary advantage American technology companies hold over their Chinese counterparts: its global user base. For companies like Google and Facebook, the competition to amass data is not between the digital activities of 330 million Americans against the virtual footprint of over one billion Chinese citizens. Instead, their products hold near-monopolies in the United States, Europe, Latin America, Africa, and most of Asia. In contrast, Chinese equivalents like Baidu and WeChat have only a handful of non-Chinese users. This global reach gives American technology companies an advantage both in the total volume of data they collect and in the diversity of data harvested. Chinese data sets, for now, are still largely blind to conditions outside of China. AI algorithms trained on those data sets would struggle to travel outside its borders.

The success of American technology companies illustrates the most promising path for the U.S. military to pursue at the dawn of its own AI age. That does not mean that the Department of Defense should simply copy Silicon Valley’s strategy mindlessly. While data from the commercial sector — such as an individual’s social connections, current employer, or personal finances — will continue to be a gold mine for global intelligence agencies, data relevant to the future battlefield will primarily concern soldiers, vehicles, training exercises, and the like. No organization will have more relevant data for these use cases than the military itself. Fortunately, the Defense Department has positioned itself well to become the globally dominant platform for military data, just as American technology companies dominate the global marketplace in their realms. The United States counts most industrialized nations as military allies and equipment manufactured by the United States or its NATO allies is driven and flown around the world. However, the Defense Department has yet to capitalize on this potential. NATO weapons and vehicles were originally designed to be interoperable in an industrial-age sense, shooting the same bullets or refueling from the same connectors. Unfortunately, NATO has not yet upgraded for the information age. The data generated by U.S. Army tanks cannot easily be accessed or aggregated with data generated by Marine Corps tanks, let alone British ones. Just as the Goldwater-Nichols Act once pushed America’s separate armed services to break out of their isolated battlefield domains, military data must now discover how to operate jointly as well. Three initiatives could be critical to accomplishing this.

First, the Defense Department could create a 10-year roadmap for upgrading data interoperability that lays out specific operational objectives to demonstrate improvements. To ensure these objectives are met, they could be incorporated into the major annual exercises conducted with NATO and East Asian allies. For example, American and South Korean units could draw spare parts and other consumables from each other during their annual training exercises. Throughout the exercise, both sides could confirm their logistics databases can combine to present a unified picture of the allied logistical situation and provide projections of future needs as the simulated combat event evolves.

Establishing tangible objectives and aligning the timeframe with existing multinational exercises will be the key to success. Militaries invest a great deal of time and effort training their personnel to be ready for the fight. They must now learn how to “train” and prepare their data as well. This can mean many things. When training their personnel, militaries spend some of their time imparting specific skillsets that will be useful in combat. In other cases, soldiers learn how to work together to solve unforeseeable problems as they arise — or simply learn how the operational routines of other units or allied militaries differ from their own. Regardless, commanders recognize their soldiers must routinely practice their skills under real-world conditions if they will be expected to work as an effective team on the battlefield.

Data needs the same types of preparation to be ready for its role in the fight. Much as soldiers need to leave the garrison and work through practical exercises in the field, it is not enough to develop a technical specification documenting how two data sets are supposed to work together. Someone needs to actually make the data sets work together. They must be routinely explored, analyzed, and aggregated to solve real problems in order to ensure they will remain interoperable and effective. Similarly, the analysts and engineers responsible for curating data need opportunities to interact with each other in order to develop the operational routines necessary to ensure effective collaboration during a crisis. Without these forcing functions, too much military data will remain isolated and unusable at the scale needed to engineer AI algorithms.

Second, the military may need to collaborate with allies to achieve common understandings about when and how to share data. European governments in particular have begun to codify digital norms for the consumer space in frameworks like the General Data Protection Regulation and the establishment of new legal concepts like the Right to be Forgotten. The United States could play a role in shaping the equivalent norms in the national security and public policy space. Otherwise, fragmented data repositories from the United States and its allies may not be able to achieve the critical mass — that is, gather enough data — necessary to compete with China’s data warehouses.

Past disagreements between the United States and its allies over norms related to atomic weapons demonstrate how these considerations can ultimately impact military operations. In Europe, the United States managed to forge an agreement that allowed the stationing of tactical nuclear weapons on the territory of its NATO allies, even in the face of significant domestic opposition in key nations such as West Germany. In contrast, the United States was unable to achieve a similar consensus among its allies in Asia. Both Japan and New Zealand banned the introduction of nuclear weapons into their territory, causing headaches for U.S. Navy operations in the region. While in that case Navy ships could find alternate ports to operate from, a similar divergence in norms would have much greater consequences for the U.S. military’s ability to develop AI. Data withheld is data lost.

Most norms about the use of military data will likely be uncontroversial. Unlike Facebook or Google, whose business models depend on precisely targeting ads at their user bases, militaries in democracies have little reason to exchange personally identifiable information or other sensitive details about their citizens. Norms about controversial topics such as autonomous systems may prove more difficult to forge a consensus around. Agreements that data provided by partners would not be used to train these systems without explicit consent could be a compromise acceptable to all parties.

Finally, the United States could seek deeper integration and cooperation with its allies who have unique resources to advance specific applications of AI. Many, including the National Security Commission on Artificial Intelligence, have called for the United States to leverage its existing “Five Eyes” alliance and extend it to include cooperation in AI. A complementary approach might be to focus on partners who have unique technical assets to contribute. For example, East Asian allies such as Japan and South Korea have invested heavily in robotics and automation, which makes them attractive partners for developing more capable drones and other autonomous vehicles. They may also have fewer hesitations about deploying these technologies than other potential partners. Similarly, the Israeli government has carefully incubated a world-class cyber security sector, potentially positioning it as a valuable collaborator in training AI-enhanced cyber defenders how to protect critical infrastructure and assets.

Ultimately, close collaborators in any AI alliance must pass two tests: They must be able to usefully contribute to the work, and they will also need to be trustworthy enough to share in these cutting-edge technical advancements. While achieving the kind of close collaboration with allies that the United States has enjoyed in other realms may be difficult, it will be essential if the United States hopes to achieve the data dominance needed to succeed in future combat.



James Ryseff is a technical policy analyst at the nonprofit, nonpartisan RAND Corporation.

Image: U.S. Navy Chief Mass Communication Specialist Jon Dasbac