Wargaming with Athena: How to Make Militaries Smarter, Faster, and More Efficient with Artificial Intelligence


For Clausewitz, while the character of war changes, the nature is immutable. For U.S. Secretary of Defense Jim Mattis, an avid reader of military history and theory, the emergence of artificial intelligence (AI) challenges this time-tested principle. He is not alone. At a recent AI conference, former U.S. Deputy Secretary of Defense Robert Work stated, “I am starting to believe very, very deeply that it is also going to change the nature of war.” In a September 2017 televised speech, Russian President Vladimir Putin predicted that the first nation to develop true AI will rule the world.

AI stands to alter military power and the balance of power. But new technology alone does not guarantee future victories. How soldiers integrate any disruptive capability into a larger system of tactics and training is what alters military power. The potential of AI lies less in smarter missiles than in augmented battle networks and organizations combining human creativity with AI applications to produce new concepts of operation, tactics, and command relationships. The question is how to get there.

The promise of AI is real, but the integration challenges are daunting. The current wave of AI enthusiasm, like earlier boom periods, will wither without a test bed to collect data and experiment with different decision-making applications. To that end, we argue that wargames provide the optimal platform for exploring how to integrate AI with operational judgment. Our wargame, Athena, offers a way to build up a repository of data for future testing, enhance understanding of how AI can assist with training, red-teaming, and simulation, and highlight the limits of these capabilities as they interact with humans in uncertain environments.

What is Artificial Intelligence?

AI is the label given to a broad category of software applications that help machines learn. There is a virtual dimension, as illustrated by the way AI helps monitor financial transactions from fraudulent activity, as well as a physical dimension, such as how Amazon predicts customer demands and intends to use this information in the future to ship goods before you order them.

Advances in hardware and software – including the revival of the mid-20th century concept of neural networks, techniques like reinforcement learning, and big data – create new possibilities. AI applications can learn from the world through image recognition and natural language processing. They can also “learn for themselves….[through] trial-and-error, solely from rewards or punishments.”

How do these developments affect the future of combat? Friction and uncertainty, both compounded in the clash of wills that is war, cloud human judgment, as do group dynamics and cognitive bias. The old adage that the first report is always wrong captures how fear and heuristics often skew the way humans interpret data.

AI can balance human interpretation against the hard facts of data. For example, software applications could listen to a planning team and analyze key terms to determine possible bias. Did planners only discuss tactical tasks oriented on the enemy and not consider friendly or terrain-oriented tasks? Furthermore, AI applications could free up staff time to focus on operational art and design. As the planners develop courses of action, software agents could run simulations on whether these options are logistically feasible given theater supply levels, historical ammunition expenditure rates, and estimated losses.

AI can also help planners know the enemy. Picture a Marine task force deployed in support of a partnered force in a combat zone. Pulling in historical data, social media, weather and patrol reports, AI applications could identify recurring patterns that help the advisers understand the environment. The software could even probabilistically predict future attacks and make recommendations for how to alter unit force posture.

In addition to battle, AI applications have the potential to revolutionize training and education. Adaptive learning software could shift the education paradigm away from a factory model to an experience tailored to individuals. As you read and answer questions, the system learns about you and adjusts accordingly. Applied to military education, military personnel could one day have their own tailored AI application that understands their analytical blind spots and risk profile, and even adjusts the background colors and language used in exams and wargames to account for the student’s strengths and weaknesses. If you are better at math, the application might tell you the probability of success, while if you respond to sports metaphors, the application could explain new data through common sports references.

The U.S. military can also find its Enders. The Defense Department can use unclassified gaming suites to let military personnel fight formations from the squad to the coalition joint task force in contemporary scenarios as a means of teaching doctrine, tactics, and enemy order of battle. Wargames are a central feature of the modern military profession. From Moltke’s famed Map Exercises to Tactical Decision Games and multimillion dollar National Training Center rotations, the military profession has long used simulation to hone its warfighting ability. If war truly is a “dynamic process of human competition requiring both the knowledge of science and the creativity of art,” the U.S. military needs to identify those individuals and teams best able to apply operational judgement. Today, wargames hold the potential for developing an AI test bed.

Athena: Building a Test Bed for AI and Decision-Making

“Where’s our Ender’s Game battle lab…where we cannot just give our leadership reps, but we can actually find out who the really good leaders are?”

-General Robert Neller, USMC

Imagine logging onto a wargame named Athena to practice planning an air assault mission to seize blocking positions in support of an amphibious landing. The game forces you to complete the planning process while you talk to an Alexa-like application, who reminds you about forms of defense, the definitions of different tactical tasks, and relevant historical examples. As you play the game, an AI application captures the data and compares your use of cover and intersecting fields of fire, among other factors, to rate your performance while contributing to a larger database of how U.S. military professionals fight. At the end, Athena assesses the data and offers you constructive tips, comparing your efforts to those of top-performers.

Currently, the military lacks the volume of data required to start building AI learning applications that would support command and staff decision making in the manner described above. Sometimes the data exists, such as in warfighting exercise databases, but it is not curated to enable machine learning and other AI algorithms. Therefore, using commercial games provides the structured environment necessary to capture the large volumes of data required to test AI applications for military decision-making.

To that end, Marine Corps University has been experimenting with Athena, a wargaming platform designed for training and education and testing future AI applications. Through a series of wargame tournaments developed with staff from the Marine Corps Tactics and Operations Group, the team, led by one of us (Dr. Benjamin Jensen), identified the best commercial off-the-shelf wargaming engines. The games were easy to play but still captured military art and science from the squad to the joint task force level. They offered players the ability to build scenarios that reflect new threats and capabilities. For example, as part of the wargaming tournament the team put together, Sea Dragon 3.0, Jensen and Colonel Timothy Barrick introduced scenarios reflecting the Marine Corps Operating Concept and gave players a chance to fight against a contemporary Russian order of battle.

Through such gaming environments, the U.S. military can find its Ender’s while capturing the data necessary to build future AI applications. It can test different decision toolkits and explore the human factors associated with integrating any new technology. Athena offers three main benefits to U.S. military planners as they seek to incorporate AI into modern capabilities.

First, it will allow for a more adaptive and tailored educational environment. By tracking the questions players ask, their interface with the game, and the results, Athena will understand how the U.S. military fights and where to improve.

Second, the game will provide a platform for testing new AI applications. For example, developers can introduce new AI-enabled logistics management to see if it improves player performance.

The more games played, the more data is collected to optimize AI applications. The next step is capturing data and beginning to structure it to enable a variety of AI experiments. Jensen’s team is designing that architecture in collaboration with Army Futures Command, the new U.S. Army 75th Innovation Division, and U.S. Marine Corps Training and Education Command. This data will feed a range of AI applications that can be tested in the gaming ecosystem. The wargame, as a test bed, provides a forum to observe human-machine collaboration.

Third, Athena will provide automated red-teaming. As more players plan and execute missions while interacting with Alexa-like interfaces – think Tony Stark’s J.A.R.V.I.S for war –  we will build a corpus of data that illustrates our biases and risk tolerances. The system will allow for tests of whether highlighting these biases changed the outcome of wargames.

Last, once there is a sufficient volume of data, Athena can simulate modern military operations on its own through referential learning and propose novel tactics. These tactics, as well as applications that test human-developed courses of action, can be tested through the commercial platform as thousands of humans compete in warfighting tournaments. Over time, the gaming environment could be used to test key elements of victory defined in military doctrine, such as “the effects of maneuver, tempo, or surprise; the advantages conferred by geography or climate; the relative strengths of the offense and defense; or the relative merits of striking the enemy in the front, flanks, or rear.”

Challenges to Integration

Athena will also show how human factors will likely limit AI’s potential. Machines that learn will remain interwoven with human biases and confronted by uncertainty. Even as algorithms seek to simulate human interaction based on replicated behaviors, they will never be truly free of the worst tendencies of people.

An obvious recent example of these shortcomings was Google’s image recognition system debacle, in which photos of African-American individuals were misidentified and placed into an album titled “Gorillas.” Here, machine learning was applied in such a way that the output reflected underlying societal pathologies rather than the intended smart process. Beyond this example, numerous studies are beginning to highlight how AI applications can be racist and sexist.

Beyond the tragedy of implicit bias, these episodes highlight the dangers of military forces getting AI wrong. If image recognition capabilities behind targeting systems had a similar flaw, the U.S. military might, for instance, be directed to attack misidentified civilians during urban operations.

Gaming environments provide a forum to see how people balance data and intuition under stress and uncertainty. As Mary “Missy” Cummings recently noted, rather than thinking about how specific military functions will benefit from AI, it is worthwhile to think about how militaries not enhanced by systems that mimic human behavior have dealt with uncertainty and ambiguity.

The short answer is that militaries across history have cultivated depth and flexibility through training and education, often grounded in wargaming, that ensures key functions are routed based on expertise. Cultivating expertise requires significant investment in specialized abilities that are heavy on training costs and cannot be applied generally to all military tasks. Expert judgment is different from general functional knowledge or training that focuses on cultivating adherence to institutional procedure. Rather, it is a unique confluence of human instinct (Clausewitz’s coup d’oeil), extensive command of a skill set, and situational awareness. These are all attributes honed through competitive wargames that condition professionals to operate in uncertain, chaotic systems. Thus, AI alone is insufficient. Judgment requires integrating the best of machines, such as pattern recognition, with the best of humans.


Large, unclassified gaming environments like Athena offer a test bed to find the right balance of human and machine in future war. The question of whether a machine can ever genuinely mimic human behavior at the level of high expertise lies at the heart of AI research. Indeed, this question animates Alan Turing’s now-famous 1936 paper “On Computable Numbers, in which the mathematician wondered if a machine could ever play chess well. He imagined that a machine could easily be trained to play chess poorly and, given sufficiently sophisticated programming, play chess well with a small risk of occasionally making catastrophic mistakes. Even after decades of subsequent scientific development, Turing’s insights still effectively describe the modern AI field: Algorithms can be trained to mimic and predict human behaviors based on an underlying deconstruction of variables and knowledge of real-world conditions, but they occasionally make serious mistakes and have a ceiling on their performance.

To effectively incorporate AI, military planners need to recognize that human judgment’s intrinsic ability to mitigate extreme uncertainty in conflict is also the hardest thing to recreate in non-human systems. For the time being, AI is likely to be most effective in learning systems where expertise is a determining variable in the outcome of a conflict scenario. AI systems can effectively reproduce basic human skills, operate given a particular understanding of complex rules, and help military personnel develop knowledge toolkits that empower their judgment. We can augment coup d’oeil but are unlikely to replace it.

Athena offers a test bed for exploring how to integrate AI into military decision-making processes. Using competitive wargames to observe how the military professional makes decisions builds the baseline data necessary to test a series of applications that augment, but will never replace, operational judgment.


Benjamin Jensen, Ph.D holds a dual appointment at Marine Corps University and American University, School of International Service. He is the author of Forging the Sword: Doctrinal Change in the U.S. Army, 1975-2010, Cyber Strategy: the Evolving Character of Power and Coercion and the “Next War” series at War on the Rocks.

Scott Cuomo is a Marine infantry officer and currently participating in the Commandant of the Marine Corps Strategist Program at Georgetown University. He has served in infantry units while deployed in support of Operation Iraqi Freedom, Operation Enduring Freedom, and multiple other contingency operations.

Chris Whyte, Ph.D. is an Assistant Professor at the L. Douglas Wilder School of Government and Public Affairs at Virginia Commonwealth University. His research interests include a range of international security topics related to the use of information technology in war and peace, political communication and cybersecurity doctrine/policy. The views expressed belong singularly to the authors and do not reflect government policy or the will of Skynet.

Image: Ars Electronica/Robert Bauernhansl