Make Data Science Accessible for the Pentagon


Editor’s Note: This article was submitted in response to the call for ideas issued by the co-chairs of the National Security Commission on Artificial Intelligence, Eric Schmidt and Robert Work. It addresses the second question (part b.) on the types of AI expertise and skill sets the national security workforce needs.


What’s at stake in the development of artificial intelligence? If America’s adversaries are to be believed, then AI has the potential to reshape the global balance of power. Vladimir Putin has claimed that “the [country that] becomes the leader in [AI] will be the ruler of the world.” Xi Jinping declared that China will be the world leader in AI by 2030, and is directing national policies in support of this goal.

The Department of Defense needs to compete in, and ultimately win, the global race to develop AI. These technologies will enable new military techniques, tactics, and procedures, and allow the Pentagon to upgrade and optimize its conventional warfighting systems. Unfortunately, a shortage in American PhD data scientists is frustrating the Pentagon’s efforts to advance AI-enabled priorities. Most proposed remedies for increasing data scientist talent in the Pentagon involve educating more American students in STEM curricula and changing talent management in the military to better recruit and retain digital talent. However, the United States cannot rapidly mint more PhDs in data science.



Instead of advocating for U.S. education reform or for Washington to compete with Silicon Valley for talent, the government should make data science more accessible to non-PhD personnel at the Defense Department. In other words, the military can take advantage of the fact that data science has become more accessible. Personnel already working in companies and government agencies are able to leverage automated machine learning tools without the need to hire more data scientists. The private sector is already demonstrating that accessibility is a feasible solution. At least 30 percent of the Fortune 500 and most of the biggest banks in the United States use automated machine learning to democratize data science within their organizations.

Uphill Battle to Attract Digital Talent into the Pentagon

The United States has recently fallen behind China in terms of technical degrees awarded. According to the National Science Board, between 2000 and 2014, the number of science and engineering bachelor’s degrees conferred by China increased 360 percent to 1.7 million per year. By contrast, the United States conferred just more than half a million science and engineering degrees in 2014, an increase of 54 percent in the same time period. All told, China has nearly twice the per-capita STEM graduates and four times the population of the United States.

With China aligning industry, academia, and government to achieve AI dominance by 2030, the United States does not have time, nor does it have the population, to compete with China on sheer numbers of data scientists. We don’t need a predictive model to recognize that the change in data science inputs will yield a change in AI outputs.

Unlike China, where the line between industry and government is hard to distinguish, the dividing line between Silicon Valley and the military is increasingly stark. Many of our brightest STEM graduates are focused on optimizing online ad revenue and making your phone a more interesting conversation partner. Given the incentives, this flow of talent is unsurprising.

Silicon Valley, after all, offers higher salaries, stock options, and nap pods. For example, comparing average salaries in the United States, a U.S. Army intelligence analyst will earn  $45,000 per year. The same individual might instead earn $123,000 as a private sector data scientist. Stock (and other) incentives put commercial data scientists on a path to receiving some of the $37.5B that was spent on AI systems in 2019 and the $97.9B expected in 2023.

Recruit Data Scientists, or Make Data Science Accessible to Recruits? 

Our recommended solution to the data scientist shortage is to have a three-tiered approach. First, the Pentagon should have a relatively small group of highly expert personnel to address the small set of complex problems that require PhD level expertise. The Pentagon should train everybody to recognize the vastly larger set of simpler problems that can be solved by existing military personnel. A third group of moderately trained technicians can solve these problems using automated machine learning platforms.

For complex problems, like autonomous control of robotic vehicles, hiring data scientists is a major and necessary recruiting challenge for the Pentagon. However, even for that challenge, the second approach of simpler, accessible data science and machine learning can be leveraged to improve military recruiting efforts for data scientists and across an array of fields.

For example, recruitment of special forces operators is a difficult process. Special forces candidate training schools are arduous by design. During the process, attrition rates can be as high as 80 percent. This means that a preponderance of a group of highly motivated and extraordinarily sharp servicemembers will wash out of the school and be, inevitably, disappointed and, very likely, demotivated. The high attrition rates maintain standards, but they are financially expensive, and expensive for morale as well.

Predictive models can identify which warfighters would be most likely to succeed as special forces candidates. An existing predictive model enables leadership to draw from an expanded pool of potential applicants who may not have thought to apply for training. By optimizing recruitment for those candidates most likely to succeed, the service can decrease the number of who people wash out, while maintaining the quality of special forces operators.

Data analytics can also be used to identify the factors that most closely predict success of candidates undergoing the special forces selection process. This gives candidates specific areas on which to focus their preparatory training, increasing the odds they might pass.

Our company, DataRobot, recently supported the Defense Department in solving a similar, highly complex human resource challenge. The lead data scientist from DataRobot was a 26-year-old Army reservist with a degree in hotel and restaurant management who, at the time, had been trained in data science with the company for about a year. Simply put, the military does not need to transform the whole U.S. workforce to win the AI race.

Educate a Few a Lot, Educate Everybody A Little 

Automated machine learning platforms democratize data science and allow people without proficiency in math to run predictive algorithms. Non-technical subject matter experts can now solve problems in an afternoon that previously would take a PhD in statistics months. Non-technical personnel are empowered to take on critical efforts such as improving recruitment. Automated machine learning boosts productivity while freeing up under-staffed Defense Department statisticians to spend their time solving more complex problems, like developing next generation AI to combat cyber threats.

Making data science accessible to non-academics is only part of the solution. For the Pentagon to fully adapt to the AI challenge posed by China, Russia, and other rivals, everyone in the Department needs to be thinking about AI applications.

This is not to say that everyone needs to be a data scientist, or even skilled at leveraging automated machine learning platforms. But everyone needs to be aware of the potential for AI to optimize systems in their area of expertise.

Cassie Kozyrkov, chief decision scientist at Google, makes this point brilliantly: You don’t need to understand the wiring diagram of a microwave oven to warm up your food. You need to know how and why to use the microwave, not how to build it.

Use cases abound. In addition to more effectively putting the right personnel into the right training programs, AI can identify warfighters and veterans at risk for suicide, identify patterns and predict cybersecurity attacks, and predict which tanks, trucks or helicopters need maintenance before they strand soldiers in the field. It can also be used to more accurately forecast demand for fuel, ammunition, or any other supplies.

Knowledgeable Oversight

The Pentagon can avoid “boiling the ocean” (i.e., spending decades transforming the U.S. workforce) by recognizing that most of its existing opportunities can be identified and solved by its existing workforce. This involves some education and cultural change, and the military should look to its congressional oversight for assistance.

Members of Congress get elected using sophisticated data science techniques, but the government that members of Congress oversee is a decade or more behind their campaigns. The Defense Department needs members of Congress to ask the right questions and keep government agencies on track in implementing AI. The life cycle of the simpler type of AI application starts with developing a use case — what operational problem can we solve using AI? What system can we make more efficient with a predictive model? The next step is developing the model — finding a team of PhD data scientists (or a technician armed with automated machine learning) to actually create the algorithm to apply to the use case. The final step is putting the model into production — giving the information generated by the algorithm to humans so they can make better decisions.

Congress needs to ask these questions of agency heads: how many use cases have you identified, how many models have your data scientists created, and how many models are in production? These reporting metrics should be written into bills authorizing AI projects or appropriating funding for AI.

A Generational Shift that Doesn’t Require A System Reset for Success

The Department of Defense is facing a radical shift in the nature of war. It should transform its processes to compete with China, a country whose command economy allows the government to steer commercial innovation towards military requirements.

To be successful, the Pentagon should train current personnel to be more conversant in AI and to enable them to leverage commercial machine learning platforms. Disruptive technologies can be harnessed to make data science accessible to all warfighters, without the need to overhaul the U.S. education system or to wrest technical talent away from Silicon Valley.

Defense stakeholders — Congress foremost among them — should develop metrics to gauge the progress of AI adoption for new challenges and integration into long-standing systems. The Defense Department, along with the rest of the federal government, must be unified in addressing this challenge.

The Pentagon already has the subject matter experts and dedicated personnel needed to successfully manage competition with adversaries like China. The Department just needs to give them the existing automated machine learning tools to accelerate and augment their efforts.



Dr. Eric Loeb is currently a customer-facing data scientist at DataRobot, where he assists federal agencies with the planning and execution of machine learning and artificial intelligence use cases. Previously, Dr. Loeb built the first White House, Congressional, state of Massachusetts, and City of Cambridge websites while pursuing his MIT PhD in cognitive & computational neuroscience. He designed and helped build a major party’s integrated voter file and online systems, after which he founded that party’s data science program. He was a technical lead in every Presidential campaign from 1992 to 2008, when he was the data science lead for the winning candidate. Dr. Loeb then earned the highest civilian honor for his work on software test automation as a political appointee in the Department of Defense.

Steven E. Moore is the Vice President of Global Government Affairs at DataRobot. Moore’s career has focused on solving public policy problems with innovative, data-driven solutions. He has advised national-level decision-makers including presidents, prime ministers, and senior military officials. Internationally, Moore has worked in more than a dozen countries to develop polling infrastructure and leverage the resulting data to inform critical policy decision-making. In Iraq, Moore’s work helped to inform the Transitional Administrative Law that was developed and implemented following the U.S. invasion of Iraq. Moore spent nearly eight years as Chief of Staff to a member of Congress and House leadership. His work in this office was recognized by Campaigns and Elections Magazine, which named Moore a CampaignTech Innovator of 2011 for his work in bringing data mining to constituent communications on Capitol Hill. 

Image: U.S. Air Force (Photo Airman 1st Class Seth Haddix)