‘It’s Either a Panda or a Gibbon’: AI Winters and the Limits of Deep Learning


From a Nobel Laureate on the MIT faculty: “Intuition, insight, and learning are no longer exclusive possessions of human beings: any large high-speed computer can be programed to exhibit them also.”

Herbert Simon wrote this in 1958. Could it have been last week?

Today, the defense community is considering artificial intelligence as a possible solution for an array of problems. The Pentagon is accelerating its artificial intelligence efforts (nearly 600 Pentagon projects include an AI component) following on the visible success of the Project Maven initiative. Others are concerned that adversaries investing heavily in these technologies will produce highly autonomous and adaptive weapons that might overmatch U.S. defenses. After all, data analytics, deep learning, and deep neural network technologies have achieved some remarkable successes in recent years.

However, both historical evidence and the known limits of these technologies argue for a more conservative estimate of their general potential. Those in the national security community championing artificial intelligence should be aware of the discipline’s history of boom and bust periods. This awareness should help the community to avoid treating artificial intelligence as an all-encompassing solution that can replace human endeavors in every realm and instead force a more sophisticated understanding of when, how, and how quickly this technology can be used to solve national security problems.

Initial Success and Failure

The first boom period followed the 1956 Dartmouth Conference, where the term artificial intelligence was first applied to the discipline. During that period, machines independently proved theorems in mathematics, provided plausible English-language responses during teletype interchanges with humans, and even bested humans at the game of checkers. The first instances of neural network technologies were used to filter noise from telephone lines. These accomplishments contributed to the notion that machines were on the brink of achieving human levels of intelligence. Governments and industry believed and invested heavily in the technologies, and the media played a role as well, publicizing overstated predictions that machines would surpass human intelligence by the turn of the century.

By the mid-1970s, it became obvious that the extreme optimism was unfounded. The techniques underlying early successes could not be generalized. As an example, early efforts to translate Russian documents into English found only limited success despite considerable government funding. After 10 years of effort and approximately $20 million in funding, the Automatic Language Processing Advisory Committee reported to the National Academy of Sciences, “we do not have useful machine translation [and] there is no immediate or predictable prospect of useful machine translation.” The committee’s report ended government funding of machine translation efforts for more than 10 years, and by some estimates, twice as long.

Research programs in neural network development were nearly killed when MIT professors proved that even though the technology had been used to filter noise from telephone lines, it could only solve a simple class of “linearly separable” problems. Other efforts in artificial research were also being questioned. The British Parliament commissioned Cambridge professor James Lighthill to assess the general progress of artificial intelligence in the United Kingdom. The Lighthill Report concluded that British artificial intelligence had achieved very little, and what had been achieved was really due to using more traditional disciplines. In the United States, DARPA cut back its support for artificial intelligence research following years of program failures. The first “AI Winter” began.

History Repeats

In the late 1970s, a new artificial intelligence technology, known as expert systems, showed some remarkable progress at automating human expertise. These were symbolic reasoning systems that relied on extracting and representing knowledge from human experts to duplicate their judgements and conclusions in specific problem areas. The symbolic, rule-based nature (“if (X) then (Y)”) of expert systems also enabled them to explain chains of reasoning, which was useful not only for decision-makers but for system developers as well. Again, tremendous optimism ensued, and governments and industry invested heavily. New industries sprang up to facilitate expert system construction and use. The U.S. government responded in 1983 with the DARPA Strategic Computing Initiative:

As a result of a series of advances in artificial intelligence, computer science, and microelectronics, we stand at the threshold of a new generation of computing technology having unprecedented capabilities…. For example, instead of fielding simple guided missiles or remotely piloted vehicles, we might launch completely autonomous land, sea, and air vehicles capable of complex, far-ranging reconnaissance and attack missions. [emphasis added]

This statement was made 35 years ago.

By the late 1980s, the bloom was off the rose once again. Expert system technology proved difficult to maintain and even less promising to apply to new areas. The U.S. government curtailed new spending on its ambitious Strategic Computing Initiative. The Japanese government revised its futuristic 5th generation computing project to remove artificial intelligence-based goals. Industries that had grown out of the expert system enthusiasm to construct special-purpose and highly profitable computing machinery failed when the machines were largely abandoned because the return on investment for many of those using them was incredibly poor. Other industries that provided expert system-building environments and tools failed or moved into other areas such as object-oriented technology development. The artificial intelligence discipline entered its second winter.

There is considerable debate about the reason for the disappearance of expert systems. Although a few will claim that they were just subsumed into standard decision-support technologies, many others feel there were more fundamental problems. Some feel that any expert system was much like an idiot savant, excelling in one niche but basically useless in a broader context. Others note the great difficulty and expense of creating the knowledge bases (the set of facts and rules that provided human expertise). Still others argue that not all forms of expertise can be quantified; there is an intuitive and creative basis that is not expressible in simple rules and facts. All these reasons relate to the difficulty of creating the information necessary to support the expert system: The information was often not expansive enough, too difficult to obtain, or not quantifiable.

So, artificial intelligence has seen two great boom periods, both brought on by impressive (for the period) advances. The researchers believed their achievements would lead to the creation of a more general machine intelligence. Governments and industry signed on, providing major funding for new programs and rolling out new tools. The media joined in, providing both sensational predictions and dire warnings for the future. Eventually, the expectations were not realizable. Machines were neither able to display “intuition and insight” nor were they capable of autonomous, “complex reconnaissance and attack missions.” Advances were made by focusing more narrowly on specific problems, but these are far from the original quest for fully human-like artificial intelligence. The term “AI Winter” was coined, used, and used again.

During these winters, the number of researchers in the field was sharply reduced. The reduction is exemplified by the changing numbers of submitted papers and number of attendees at major artificial intelligence conferences. Work did continue, but at greatly reduced scale and pace, and often under the banner of “computer science” since describing the work as part of an artificial intelligence effort carried a stigma of failure.

The Third Boom

Today, we are in a third boom period for artificial intelligence, this time fueled by some spectacular results from deep learning capabilities and architectural improvements in neural network technologies (the sharp rise in attendees at the Neural Information Processing Systems conference is one measure). Venture capital and private equity investment in artificial intelligence-focused companies was between $1 and 2 billion in 2010, increasing to $5 to 8 billion just six years later, and projections for annual global revenue for AI products are as high as $35 billion by 2025. Much of the recent success is enabled by two key developments. First, vastly more and higher-quality data are available (e.g., ImageNet and Project Maven) to train neural networks to solve specific problems. Second, advances in computer processing power have allowed use and construction of very large networks. Reactions to these achievements are quite similar to those that occurred in past booms: Governments are investing heavily in the technologies; industries are rolling out hardware and software designed to make neural networks more powerful, accessible, and usable; the media is sensationalizing already sensational achievements like Watson winning on Jeopardy! or AlphaGo defeating an international Go human champion. And again, it is not difficult to find predictions that artificial intelligence will overtake and perhaps destroy human society.

However, history reminds us that when something seems too good to be true, that might be the case. In addition to the lessons of history, we also need to be aware of chinks in the deep-learning technological armor.

Google is a leader in the deep learning effort. One of its artificial intelligence researchers, Francois Chollet, recently made some succinct observations about the limits of deep learning technologies: “Current supervised perception and reinforcement learning algorithms require lots of data, are terrible at planning, and are only doing straightforward pattern recognition.”

Chollet’s comments are an important reminder that neural network systems are, at their core, greatly improved pattern recognition systems. Problem solving with a neural network requires the problem to be formulated as a numeric pattern-matching problem, which is often difficult. This isn’t unique to deep learning: In other established and powerful problem-solving technologies, problem representation can be the most important, difficult, and time-consuming requirement. Linear programming, for instance, is an excellent method for solving some optimization problems, but a difficulty of using it lies in the art of problem formulation: Not every problem is an optimization problem, and not every optimization problem can be correctly formulated for the method.

Chollet also cites the imperative for large amounts of data to train the neural networks. What about the problem domains in which large amounts of appropriately labeled training data are not available? Expert systems floundered when a base of usable problem knowledge could not be provided. Is there a connection between the expert system’s need for specifically encoded problem-solving information (the knowledge base) and the neural network’s need for a large training base (“big data”)? In general, the defense industries are poster children for “tiny data.” The tendency is to keep secret things out of public view, which will make it difficult to obtain suitable data to train a neural network to recognize these systems.

When discussing the quantities of data needed to train a network, Chollet uses the imprecise “lots.” The exact amount of training data needed is rarely known ahead of time (except that more is better). Neural networks can be over- or under-trained, and training usually continues as long as performance improves. Further, engineers face many different design choices when building a neural network, and there are no established guidelines relating choices to specific problem types. The lack of an underpinning theory has dire implications for verification and validation of deep learning systems, since it is very difficult to explain the inner workings of neural networks. This is the “black box” problem. A neural network can assess an image and answer, “at 58 percent confidence, that image is a panda.” But it cannot explain how it arrived at that conclusion. The DARPA Explainable AI effort is attempting improvements, but it is very early in the effort. How would the typical commander react to a recommendation that the system could not explain in familiar terms or recognize the need to begin failure analysis based on a flawed recommendation?

Neural networks also can be fooled by “adversarial examples,” in which minor changes to an input pattern can yield very different results. A well-known example shows that by changing only 0.04 percent of the pixel values in an input image, a neural network changes its solution from the correct classification “Panda with 57.7 percent confidence” to an incorrect “Gibbon with 99.3 percent confidence.” A 0.04 percent change would be 400 pixels out of a million. This change goes undetected by the human eye.

So, neural network technologies have limitations. On the other hand, deep learning has demonstrated some incredible results. Researchers should apply these technologies where appropriate, but enthusiasm should be tempered by healthy skepticism about unproven performance claims. As an example, there have been recent media reports that artificial intelligence research at Facebook resulted in two computers, named Bob and Alice, independently inventing their own, more efficient, language to communicate with each other.

Bob: “I can can I I everything else.”

Alice: “Balls have zero to me to me to me to me to me to me to me to me to.”

HuffPost wrote: “When English wasn’t efficient enough, the robots took matters into their own hands.” Claiming this as a new, more efficient language seems to be an example of the media grasping for the sensational. The researchers involved report that they do not know what the communication actually means, and they do not understand what type of “thinking” goes on inside a neural network to produce this exchange. Given that we don’t understand the meaning of the “conversation” or how it emerged from internal reasoning, the exchange between Bob and Alice seems most likely to be a programming error. In fact, Facebook ultimately changed the software to prevent excursions into language use like the above. The incident should never have been seen as newsworthy, much less reported as an incredible machine performance.

It appears that history is beginning to repeat. Artificial intelligence technology is progressing impressively. Investment is rising apace, if not faster. The media is energized. The Department of Defense is planning and applying resources, and there are thoughts of expanding the use of artificial intelligence into business reform, intelligence, acquisition, training, and weapon systems as a few examples. Others talk about an immediate need to respond in the “AI Arms Race.” Will history now repeat full cycle with a third AI Winter? Although there is some debate, a third winter can be avoided if policymakers and researchers adopt a more reasoned approach that recognizes both the capabilities and limitations of these technologies and shores them up when necessary. Neither deep learning nor neural networks are universally applicable, and each has room for improvement. Technologies like modeling and simulation can be layered on top of deep learning capabilities to reduce the likelihood of system error, sidestep the impact of system opaqueness, and help to explain recommendations. The new Air Force “Data to Decision” effort is using this approach.

The Bottom Line

Today’s successful artificial intelligence capabilities are making genuine and impressive strides forward and are likely to continue this performance in specific application areas. However, history shows that overestimated potential has led to the frustration of unmet expectations and investment with little outcome. Past AI Winters reduced interest, funding, and research. Neural network technologies are incredibly valuable in pattern-matching tasks but have little application outside of that problem area and have significant prerequisites for use in that area.

At the same time, there is a widespread sense that the United States must do more with available technologies. The AI arms race pressure is increasing, and the United States may not have even entered the race yet. It still leads, but China is catching up rapidly, and some stress that dominance in AI is “likely to be coupled with a reordering of global power.” It’s clear that the defense budget is rising and will devote more to AI. We have become accustomed to stories like “With a sprinkle of AI dust, Google boosts options for ads in mobile apps.” But there’s a huge difference between choosing advertisements and making military decisions. “Sprinkling AI dust” on national security problems is not a viable option. Arati Prabhakar, former director of DARPA, cautions, “We have to be clear about where we’re going to use the technology and where it’s not ready for prime time… it’s just important to be clear-eyed about what the advances, in for example, machine learning can and can’t do.” The time to pay attention to this warning is now, as America starts responding to this century’s “Sputnik moment.” The limitations of these technologies are common knowledge within the community of artificial intelligence researchers. They need to become more widely understood by those who seek to apply “AI” to solve problems of national security.


Robert Richbourg is a retired army officer and now a Research Staff Member at the Institute for Defense Analyses. He holds a Ph.D. in computer science with a major area of artificial intelligence. He served his last 10 years of active duty as an academy professor of computer science and director of the Office of Artificial Intelligence Analysis and Evaluation at the United States Military Academy, West Point.

Image: Flickr