Breaking the (Algorithmic) Black Box in Security Affairs

January 7, 2015

Algorithms have become a buzzword in policy circles – but in many cases, using the term “algorithm” alone is akin to the common journalist errors of making every armored vehicle a tank or assault rifle an AK-47. It renders the details of the technology – and their ramifications for public policy – a black box immune to rational policy analysis. We need something more, especially when talking about ill-specified and complex computational problems that arise from particular defense applications.

The Banality of Algorithms

Public policy circles are suddenly talking about algorithms. “Algorithms” are telling you how to vote, revoking driver’s licenses, preventing military suicides, and predicting crimes. Algorithms may also be responsible for your totally awkward OKCupid date. Sounds scary, doesn’t it? We have to stop those algorithms before politics and liberty are dead and they rule the world!

But there’s a problem – algorithms are not homogenous and can vary widely in design, application, and evaluation. One market-leading textbook in computer science defines the discipline as the study of algorithms, including their mathematical properties, hardware realizations, implementation in programming languages, and applications to important problems. In another words, an algorithm is the most basic and banal building block of the computational sciences. Fear mongering about abstract “algorithms” merely obfuscates the challenge of understanding what technical details and mathematical proofs mean for policy decisions and regulatory regimes.

An informal definition of an algorithm is “any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values as output.” An algorithm is a “sequence of computational steps” that manipulates the input and produces desired output — a mathematically precise recipe for solving a given computational problem. So where to begin in thinking about algorithms?

It is easier to begin with the issue of how something security audiences know a lot about (domain details) is incorporated into something many don’t know much about (algorithms and computational thinking). It is also appropriate and timely given that defense problems are often domain-knowledge intensive. By looking at the concepts of heuristics and features, security audiences can better understand some trade-offs and dynamics involved in computational problem-solving that utilizes domain knowledge.

Heuristics and Features

It is safe to say that the defense world contains particularly ill-specified and computationally intensive algorithmic problems, from calculating the location of elusive German U-Boats (which is the origin of today’s Center for Naval Analyses) to searching large quantities of drone data. A key prerequisite to solving many kinds of ill-specified and computationally costly problems lies in exploiting and representing knowledge about the domain. Solving computational problems in defense and security in particular may depend on exploiting domain knowledge – for example, a recent artificial intelligence military decision support tool utilizes Clausewitzian theory prepared by subject matter experts from the Army War College.

While the issue of how domain knowledge (security or otherwise) is incorporated into computation is complex and multifaceted, the concepts of heuristics and features can illustrate both the promise and peril involved in turning ground truth into efficient computation. The psychology and social science pioneer Herbert Simon thought that heuristics in computers modeled how humans used rules of thumb and shortcuts to find solutions to daunting problems. Additionally, some psychologists believe that we take in sensory stimuli and break it down into descriptive features (a dog has legs, a head, etc).

For some algorithmic problems, heuristics – shortcuts that can help inform otherwise “blind” search algorithms — are used to produce a solution that may not be the best possible outcome but at least gets the job done on time. Many heuristics rely on assumptions directly drawn from the domain of application. For example, in computer chess the countermove heuristic assumes that many moves admit a valid countermove regardless of the actual position.

It follows that the application of some heuristic algorithms in security affairs could rely on modified rules and tricks of the trade known to, say, the working intel analyst or infantryman. Discover some key rule of the application domain, abstract it into a component of an algorithm, and the programmer has a means of generating good enough solutions for a previously intractable problem. The rich domain expertise found in security problems among practitioners, experts, and participants offers a wealth of possible heuristics to draw from.

However, this merely raises the question of how valid such guiding assumptions really are. Defectors can massage the truth, supposedly time-tested principles of war may just be superstition, and intelligence fed by partner security agencies may be misleading if not outright wrong – thus injecting human biases and blind spots into computational solution search. Additionally, valid or not heuristics are ultimately still abstractions. Something important may be lost in translation when a programmer takes a useful idea culled from a Ramadi patrol and implements it on a computer.

A similar process issue holds true for features used to build many of the predictive algorithms that people think of when they warn of “algorithms” taking over. The data-driven algorithms increasingly driving today’s hottest applications depend on features – attributes of the data meaningful to the problem – to represent the key aspects of the problem domain. For example, a tweet might be an individual data observation but a specific phrase might be a useful feature that can be used to as an input to a predictive algorithm. The process of feature engineering is the art of finding ways to better represent the data. For example, an analyst might take a categorical attribute such as ITEM_COLOR (possible values are Red, Blue, or Unknown) and transform it into a new feature called HAS_COLOR that has a value of 1 when a data point has a known color and 0 when the color is unknown.

Choosing the right features is something of an art form, and this has very real implications for defense applications. For example, human intelligence gathering may yield a critical insight about how to represent (and thus create algorithms for) the security domain that would otherwise be unknown. A informant may reveal that X or Y feature of a terrorist organization is more or less important for using past data to predict future attacks. But as with heuristics, the complexity of using domain knowledge also should not be overlooked. Analysts that neglect the hard process of understanding the context of the data have often failed to accurately analyze security problems.

Beyond the Black Box

Both heuristics and features involve discerning what aspects of the application domain are important and which are not so that algorithms may use only the most relevant domain knowledge necessary to solve the problem. So we just need to find the right knowledge to put into the machine and everything is a-ok, right? Unfortunately, everything is only obvious once you know the answer. Things that are obvious in retrospect were just one of many possibilities when the problem was still unsolved.

Common sense ain’t common, especially when it comes to any subject of nontrivial complexity. Creating computational solutions for defense problems – whether that involves using the right heuristics to search for solutions or finding the right features to predict something of interest – is often a process of tinkering and experimentation to engineer the best way to represent the domain and the knowledge particular to it.

Additionally, technology enables and constrains certain outcomes, and the process of using technology to solve problems is one of making tradeoffs, understanding core assumptions and risks, and developing appropriate expectations. Alternatives must be weighed and why a given solution succeeded or failed should not be enigmatic to consumers and stakeholders. This principle goes beyond just the issue of domain knowledge and algorithms – it holds true for many areas of computational problem-solving that interface with policy.

Heuristics and features are also several of many computational considerations involved in making algorithms. For example, embedded systems necessitate more efficient uses of memory and thus different algorithmic choices. Such considerations suggest that writing off the whole shebang as a bunch of nerds putting stuff into their algorithm according to their incomprehensible nerdy ways is self-defeating.

Understanding the theory of algorithms and how particular kinds of algorithms are designed and built gives security decision-makers and analysts power and agency to shape, guide, and evaluate the process. The alternative is imputing saintly power and expectations or demonic capabilities and mysticism to a black box called “algorithms.” Unless the details of algorithms are highlighted in policy discussions, superstition and hysteria instead of sound policy is what the procedure called “securityAnalysis()” will unfortunately compute.

 

Adam Elkus is a PhD student in Computational Social Science at George Mason University and a columnist at War on the Rocks. He has published articles on defense, international security, and technology at CTOVision, The Atlantic, the West Point Combating Terrorism Center’s Sentinel, and Foreign Policy.