Put Your Money Where Your Strategy Is: Using Machine Learning to Analyze the Pentagon Budget

5721034 (1)

 A “masterpiece” is how then-Deputy Defense Secretary Patrick Shanahan infamously described the Fiscal Year 2020 budget request. It would, he said, align defense spending with the U.S. National Defense Strategy — both funding the future capabilities necessary to maintain an advantage over near-peer powers Russia and China, and maintaining readiness for ongoing counter-terror campaigns.

The result was underwhelming. While research and development funding increased in 2020, it did not represent the funding shift toward future capabilities that observers expected. Despite its massive size, the budget was insufficient to address the department’s long-term challenges. Key emerging technologies identified by the department — such as hypersonic weapons, artificial intelligence, quantum technologies, and directed-energy weapons — still lacked a “clear and sustained commitment to investment.” It was clear that the Department of Defense did not make the difficult tradeoffs necessary to fund long-term modernization. The Congressional Budget Office further estimated that the cost of implementing the plans, which were in any case insufficient to meet the defense strategy’s requirements, would be about 2 percent higher than department estimates.



Has anything changed this year? The Department of Defense released its FY2021 budget request Feb. 10, outlining the department’s spending priorities for the upcoming fiscal year. As is mentioned every year at its release, the proposed budget is an “aspirational” document — the actual budget must be approved by Congress. Nevertheless, it is incredibly useful as a strategic document, in part because all programs are justified in descriptions of varying lengths in what are called “budget justification books.” After analyzing the 10,000-plus programs in the research, development, testing and evaluation budget justification books using a new machine learning model, it is clear that the newest budget’s tepid funding for emerging defense technologies fails to shift the department’s strategic direction toward long-range strategic competition with a peer or near-peer adversary.

Regardless of your beliefs about the optimal size of the defense budget or whether the 2018 National Defense Strategy’s focus on peer and near-peer conflict is justified, the Department of Defense’s two most recent budget requests have been insufficient to implement the administration’s stated modernization strategy fully.

To be clear, this is not a call to increase the Department of Defense’s budget over its already-gargantuan $705.4 billion FY2021 request. Nor is this the only problem with the federal budget proposal, which included cuts to social safety net programs — programs that are needed now more than ever to mitigate the effects from COVID-19. Instead, my goal is to demonstrate how the budget fails to fund its intended strategy despite its overall excess. Pentagon officials described the budget as funding an “irreversible implementation of the National Defense Strategy,” but that is only true in its funding for nuclear capabilities and, to some degree, for hypersonic weapons. Otherwise, it largely neglects emerging technologies.

A Budget for the Last War

The 2018 National Defense Strategy makes clear why emerging technologies are critical to the U.S. military’s long-term modernization and ability to compete with peer or near-peer adversaries. The document notes that “advanced computing, ‘big data’ analytics, artificial intelligence, autonomy, robotics, directed energy, hypersonics, and biotechnology” are necessary to “ensure we will be able to fight and win the wars of the future.” The Government Accountability Office included similar technologies — artificial intelligence, quantum information science, autonomous systems, hypersonic weapons, biotechnology, and more — in a 2018 report on long-range emerging threats identified by federal agencies.

In the Department of Defense’s budget press release, the department argued that despite overall flat funding levels, it “made numerous hard choices to ensure that resources are directed toward the Department’s highest priorities,” particularly in technologies now termed “advanced capabilities enablers.” These technologies include hypersonic weapons, microelectronics/5G, autonomous systems, and artificial intelligence. Elaine McCusker, the acting undersecretary of defense (comptroller) and chief financial officer, argued, “Any place where we have increases, so for hypersonics or AI for cyber, for nuclear, that’s where the money went … This budget is focused on the high-end fight.” (McCusker’s nomination for Department of Defense comptroller was withdrawn by the White House in early March because of her concerns over the 2019 suspension of defense funding for Ukraine.) Deputy Defense Secretary David L. Norquist noted that the budget request had the largest research and development request ever.

Despite this, the FY2021 budget is not a significant shift from the FY2020 budget in developing advanced capabilities for competition against a peer or near-peer. I analyzed data from the Army, Navy, Air Force, Missile Defense Agency, Office of the Secretary of Defense, and Defense Advanced Research Projects Agency budget justification books, and the department has still failed to realign its funding priorities toward the long-range emerging technologies that strategic documents suggest should be the highest priority. Aside from hypersonic weapons, which received already-expected funding request increases, most other types of emerging technologies remained mostly stagnant or actually declined from FY2020 request levels.

James Miller and Michael O’Hanlon argued in their analysis of the FY2020 budget, “Desires for a larger force have been tacked onto more crucial matters of military innovation” and that the department should instead prioritize quality over quantity. This criticism could be extended to the FY2021 budget, along with the indictment that military innovation itself wasn’t fully prioritized either.

Breaking It Down

In this brief review, I attempt to outline funding changes for emerging technologies between the FY2020 and FY2021 budgets based on a machine learning text-classification model, while noting cornerstone programs in each category.

Let’s start with the top-level numbers from the R1 document, which divides the budget into seven “budget activities.” Basic and applied defense research account for 2 percent and 5 percent of the overall FY2021 research and development budget, compared to 38 percent for operational systems development and 27 percent for advanced component development and prototypes. The latter two categories have grown from 2019, in both real terms and as a percentage of the budget, by 2 percent and 5 percent, respectively. These categories were both the largest overall budget activities and also received the largest percentage increases.

Federally funded basic research is critical because it helps develop the capacity for the next generation of applied research. Numerous studies have demonstrated the benefit of federally funded basic science research, with some estimates suggesting two-thirds “of the technologies with the most far-reaching impact over the last 50 years [stemmed] from federally funded R&D at national laboratories and research universities.” These technologies include the internet, robotics, and foundational subsystems for space-launch vehicles, among others. In fact, a 2019 study for the National Bureau of Economic Research’s working paper series found evidence that publicly funded investments in defense research had a “crowding in” effect, significantly increasing private-sector research and development from the recipient industry.

Concerns over the levels of basic research funding are not new. A 2015 report by the MIT Committee to Evaluate the Innovation Deficit argued that declining federal basic research could severely undermine long-term U.S. competitiveness, particularly for research areas that lack obvious “real-world” applications. This is particularly true given that the share of industry-funded basic research has collapsed, with the authors arguing that U.S. companies are left “dependent on federally-funded, university-based basic research to fuel innovation. This shift means that federal support of basic research is even more tightly coupled to national economic competitiveness.” A 2017 analysis of America’s artificial intelligence strategy recommended that the “government [ensure] adequate funding for scientific research, averting the risks of an ‘innovation deficit’ that could severely undermine long-term competitiveness.” Data from the Organization for Economic Cooperation and Development shows that Chinese government research and development spending has already surpassed that of the United States, while Chinese business research and development expenditures are rapidly approaching U.S. levels.

While we may debate the precise levels of basic and applied research and development funding, there is little debate about its ability to produce spillover benefits for the rest of the economy and the public at large. In that sense, the slight declines in basic and applied research funding — in both real terms and as a percentage of overall research and development funding — hurt the United States in its long-term competition with other major powers.

Clean, Code, Classify

The Defense Department’s budget justification books contain thousands of pages of descriptions spread across more than 20 separate PDFs. Each program description explains the progress made each year and justifies the funding request increase or decrease. There is a wealth of information about Department of Defense strategy in these documents, but it is difficult to assess departmental claims about funding for specific technologies or to analyze multiyear trends while the data is in PDF form.

To understand how funding changed for each type of emerging technology, I scraped and cleaned this information from the budget documents, then classified each research and development program into categories of emerging technologies (including artificial intelligence, biotechnologies, directed-energy weapons, hypersonic weapons and vehicles, quantum technologies, autonomous and swarming systems, microelectronics/5G, and non-emerging technology programs). I designed a random forest machine learning model to sort the remaining programs into these categories. This is an algorithm that uses hundreds of decision trees to identify which variables — or words in a program description, in this case — are most important for classifying data into groups.

There are many kinds of machine learning models that can be used to classify data. To choose one that would most effectively classify the program data, I started by hand-coding 1,200 programs to train three different kinds of models (random forest, k-nearest neighbors, and support vector machine), as well as for a model testing dataset. Each model would look at the term frequency-inverse document frequency (essentially, how often given words appear adjusted for how rarely they are used) of all the words in a program’s description to decide how to classify each program. For example, for the Army’s Long Range Hypersonic Weapon program, the model might have seen the words “hypersonic,” “glide,” and “thermal” in the description and guessed that it was most likely a hypersonic program. The random forest model slightly outperformed the support vector machine model and significantly outperformed the k-nearest neighbors model, as well as a simpler method that just looked for specific keywords in a program description.

Having chosen a machine-learning model to use, I set it to work classifying the remaining 10,000 programs. The final result is a large dataset of programs mentioned in the 2020 and 2021 research and development budgets, including their full descriptions, predicted category, and funding amount for the year of interest. This effort, however, should be viewed as only a rough estimate of how much money each emerging technology is getting. Even a fully hand-coded classification that didn’t rely on a machine learning model would be challenged by sometimes-vague program descriptions and programs that fund multiple types of emerging technologies. For example, the “Applied Research for the Advancement of S&T Priorities” program funds projects across multiple categories, including “electronic warfare, human systems, autonomy, and cyber … advanced materials, biomedical, weapons, quantum, and command, control, communications, computers and intelligence.” The model took a guess that the program was focused on quantum technologies, but that is clearly a difficult program to classify into a single category.

With the programs sorted and classified by the model, the variation in funding between types of emerging technologies became clear.

Hypersonic Boost-Glide Weapons Win Big

Both the official Department of Defense budget press release and the press briefing singled out hypersonic research and development investment. As one of the department’s “advanced capabilities enablers,” hypersonic weapons, defenses, and related research received $3.2 billion in the FY2021 budget, which is nearly as much as the other three priorities mentioned in the press release combined (microelectronics/5G, autonomy, and artificial intelligence).

In the 2021 budget documents, there were 96 programs (compared with 60 in the 2020 budget) that the model classified as related to hypersonics based on their program descriptions, combining for $3.36 billion — an increase from 2020’s $2.72 billion. This increase was almost solely due to increases in three specific programs, and funding for air-breathing hypersonic weapons and combined-cycle engine developments was stagnant.

The three programs driving up the hypersonic budget are the Army’s Long-Range Hypersonic Weapon, the Navy’s Conventional Prompt Strike, and the Air Force’s Air-Launched Rapid Response Weapon program. The Long-Range Hypersonic Weapon received a $620.42 million funding increase to field an “experimental prototype with residual combat capability.” The Air-Launched Rapid Response Weapon’s $180.66 million increase was made possible by the removal of funding for the Air Force’s Hypersonic Conventional Strike Weapon in FY2021 — which saved $290 million compared with FY2020. This was an interesting decision worthy of further analysis, as the two competing programs seemed to differ in their ambition and technical risk; the Air-Launched Rapid Response Weapon program was designed for “pushing the art-of-the-possible” while the conventional strike weapon was focused on integrating already mature technologies. Conventional Prompt Strike received the largest 2021 funding request at $1 billion, an increase of $415.26 million over the 2020 request. Similar to the Army program, the Navy’s Conventional Prompt Strike increase was fueled by procurement of the “Common Hypersonic Glide Body” that the two programs share (along with a Navy-designed 34.5-inch booster), as well as testing and integration on guided missile submarines.

To be sure, the increase in hypersonic funding in the 2021 budget request is important for long-range modernization. However, some of the increases were already planned, and the current funding increase largely neglects air-breathing hypersonic weapons. For example, the Navy’s Conventional Prompt Strike 2021 budget request was just $20,000 more than anticipated in the 2020 budget. Programs that explicitly mention scramjet research declined from $156.2 million to $139.9 million.

In contrast to hypersonics, research and development funding for many other emerging technologies was stagnant or declined in the 2021 budget. Non-hypersonic emerging technologies increased from $7.89 billion in 2020 to only $7.97 billion in 2021, mostly due to increases in artificial intelligence-related programs.

Biotechnology, Quantum, Lasers Require Increased Funding

Source: Graphic by the author.

Directed-energy weapons funding fell slightly in the 2021 budget to $1.66 billion, from $1.74 billion in 2020. Notably, the Army is procuring three directed-energy prototypes to support the maneuver-short range air defense mission for $246 million. Several other programs are also noteworthy. The High Energy Power Scaling program ($105.41 million) will finalize designs and integrate systems into a prototype 300 kW-class high-energy laser, focusing on managing “thermal blooming” (a distortion caused by the laser heating the atmosphere through which it travels) for 300 and eventually 500 kW-class lasers. Second, the Air Force’s Directed Energy/Electronic Combat program ($89.03 million) tests air-based directed-energy weapons for use in contested environments.

Quantum technologies funding increased by $109 million, to $367 million, in 2021. In general, quantum-related programs are more exploratory, focused on basic and applied research rather than fielding prototypes. They are also typically funded by the Office of the Secretary of Defense or the Defense Advanced Research Projects Agency rather than by the individual services, or they are bundled into larger programs that distribute funding to many emerging technologies. For example, several of the top 2021 programs that the model classified as quantum research and development based on their descriptions include the Office of the Secretary of Defense’s Applied Research for the Advancement of S&T Priorities ($54.52 million), or the Defense Advanced Research Projects Agency’s Functional Materials and Devices ($28.25 million). The increase in Department of Defense funding for quantum technologies is laudable, but given the potential disruptive ability of quantum technologies, the United States should further increase its federal funding for quantum research and development, guarantee stable long-term funding, and incentivize young researchers to enter the field. The FY2021 budget’s funding increase is clearly a positive step, but quantum technologies’ revolutionary potential demands more funding than the category currently receives.

Biotechnologies increased from $969 million in 2020 to $1.05 billion in 2021 (my guess is that the model overestimated the funding for emerging biotech programs, by including research programs related to soldier health and medicine that involve established technologies). Analyses of defense biotechnology typically focus on the defense applications of human performance enhancement, synthetic biology, and gene-editing technology research. Previous analyses, including one from 2018 in War on the Rocks, have lamented the lack of a comprehensive strategy for biotechnology innovation, as well as funding uncertainties. The Center for Strategic and International Studies argued, “Biotechnology remains an area of investment with respect to countering weapons of mass destruction but otherwise does not seem to be a significant priority in the defense budget.” These concerns appear to have been well-founded. Funding has stagnated despite the enormous potential offered by biotechnologies like nanotubes, spider silk, engineered probiotics, and bio-based sensors, many of which could be critical enablers as components of other emerging technologies. For example, this estimate includes the interesting Persistent Aquatic Living Sensors program ($25.7 million) that attempts to use living organisms to detect submarines and unmanned underwater vehicles in littoral waters.

Programs classified as “autonomous” or swarming research and development declined from $3.5 billion to $2.8 billion in 2021. This includes the Army Robotic Combat Vehicle program (stagnant at $86.22 million from $89.18 million in 2020). The Skyborg autonomous attritable (a low-cost, unmanned system that doesn’t have to be recovered after launch) drone program requested $40.9 million and also falls into the “autonomy” category, as do the Air Force’s Golden Horde ($72.09 million), Office of the Secretary of Defense’s manned-unmanned teaming Avatar program ($71.4 million), and the Navy’s Low-Cost UAV Swarming Technology (LOCUST) program ($34.79 million).

The programs sorted by the model into the “artificial intelligence” category increased from $1.36 billion to $1.98 billion in 2021. This increase is driven by an admirable proliferation of smaller programs —161 programs under $50 million, compared with 119 in 2020. However, as the Department of Defense reported that artificial intelligence research and development received only $841 million in the 2021 budget request, it is clear that the random forest model is picking up some false positives for artificial intelligence funding.

Some critics argue that federal funding risks duplicating artificial intelligence efforts in the commercial sector. There are several problems with this argument, however. A 2017 report on U.S. artificial intelligence strategy argued, “There also tends to be shortfalls in the funding available to research and start-ups for which the potential for commercialization is limited or unlikely to be lucrative in the foreseeable future.” Second, there are a number of technological, process, personnel, and cultural challenges in the transition of artificial intelligence technologies from commercial development to defense applications. Finally, the Trump administration’s anti-immigration policies hamstring U.S. technological and industrial base development, particularly in artificial intelligence, “as immigrants are responsible for one-quarter of startups in the United States.”

The Neglected Long Term

While there are individual examples of important programs that advance the U.S. military’s long-term competitiveness, particularly for hypersonic weapons, the overall 2021 budget fails to shift its research and development funding toward emerging technologies and basic research.

While recognizing that the overall budget was essentially flat, it should not come as a surprise that research and development funding for emerging technologies was mostly flat as well. But the United States already spends far more on defense than any other country, and even with a flat budget, the allocation of funding for emerging technologies does not reflect an increased focus on long-term planning for high-end competition compared with the 2020 budget. Specifically, the United States should increase its funding for emerging technologies other than hypersonics — directed energy, biotech, and quantum information sciences, as well as in basic scientific research — even if it requires tradeoffs in other areas.

The problem isn’t necessarily the year-to-year changes between the FY2020 and FY2021 budgets. Instead, the problem is that proposed FY2021 funding for emerging technologies continues the previous year’s underwhelming support for research and development relative to the Department of Defense’s strategic goals. This is the critical point for my assessment of the budget: despite multiple opportunities to align funding with strategy, emerging technologies and basic research have not received the scale of investment that the National Defense Strategy argues they deserve.



Chad Peltier is a senior defense analyst at Jane’s, where he specializes in emerging defense technologies, Chinese military modernization, and data science. This article does not reflect the views of his employer.

Image: U.S. Army (Photo by Monica K. Guthrie)