When the world's at stake,
go beyond the headlines.

National security. For insiders. By insiders.

Join War on the Rocks and gain access to content trusted by policymakers, military leaders, and strategic thinkers worldwide.

BECOME A MEMBER

Cogs of War

Building Trust in Military AI Starts with Opening the Black Box

Roy Lindelauf and Herwin Meerveld

August 12, 2025

Building Trust in Military AI Starts with Opening the Black Box

Cogs of War

Building Trust in Military AI Starts with Opening the Black Box

Roy Lindelauf and Herwin Meerveld

August 12, 2025

At high-profile TED-style events with millions of online viewers, charismatic tech CEOs take the stage to unveil their latest gadgets. In slick, high-energy presentations, they criticize the slow and outdated cycles of traditional military innovation and procurement. They paint vivid scenarios of future warfare where adversaries are agile, tech-savvy, and armed with autonomous systems, implying that only the most cutting-edge technology can prevail. Unsurprisingly, the solution to these looming threats aligns perfectly with the products they happen to be selling. Policymakers, often unfamiliar with the complexities of AI and emerging tech, along with senior military officials and influential defense industry figures, are drawn in by the urgency and clarity of the message. Back home, they confidently explain to others why preparing for killer drone swarms and AI-powered adversaries justifies immediate investment in these high-tech solutions.

However, there appears to be a noticeable gap between the capabilities often portrayed by the defense tech industry, through systems such as Palantir’s Gotham, Avalor’s Nexus, or Helsing’s Altra, and what can realistically be achieved in complex military operational environments. In our own academic research on mission planning for military platforms, for example, we frequently encounter complex mathematical challenges related to optimizing human-machine interaction in an operationally realistic, practically usable, and responsible way. Addressing these challenges often requires novel methods for making systems smarter and more adaptive.

One such method is Bayesian optimization, a type of mathematical problem-solving used to improve complex systems through trial and error. Imagine tweaking a system based on user feedback, trying something, seeing how it performs, and then using that result to make a better decision the next time. That’s essentially what Bayesian optimization does: It helps a system learn, step by step, to make better choices, even when information is limited or uncertain. This approach often relies on something called a Gaussian process, a statistical method that makes educated guesses about things like user preferences or system behavior, based on the data that’s already available.

However, these models can be difficult to interpret and are often built on certain assumptions. One key assumption is that the relationships or patterns in the data are smooth, meaning that small changes in input variables will lead to small, predictable changes in outcomes. In other words, the system assumes that the world behaves in a gradual and continuous manner. This assumption of smoothness can be problematic in complex and dynamic real-world environments like military operations, where changes may be abrupt, relationships nonlinear, and data noisy or incomplete. To make these techniques work in practice, you need more than just smart algorithms. You need models that can handle uncertainty, adapt quickly, work with incomplete or noisy data, while also remaining transparent, explainable and interpretable to human users. Developing such systems requires deep technical innovation, rigorous testing, and a lot of iteration.

Here lies the dilemma: Companies that have the resources to build these advanced systems often face two choices. Either they choose not to adapt these techniques to settings that make them transparent and explainable, or they do use and adapt them, but keep their methods proprietary and opaque, often out of fear of losing competitive advantage. While this is understandable from a commercial perspective, it poses a serious challenge for defense organizations, which need a clear understanding of what is happening under the hood in order to ensure that AI is used safely, ethically, and in alignment with strategic goals.

While industry plays a vital role and should be part of this solution, its incentives, focused on speed, scalability, and market dominance, don’t always align with the public interest. Governments should often move more cautiously and transparently, prioritizing responsible AI adoption over rapid deployment. For this reason, collaboration between public and private sectors is crucial, but it should be shaped by democratic values, accountability, and the specific needs of public safety and ethics, not just by commercial logic. A potential way forward could involve hybrid arrangements in which proprietary systems remain closed to the general market but are made open and auditable to trusted government partners under strict contractual safeguards. This is not a new idea, but what is new is the growing difficulty of understanding and verifying the behavior of modern AI systems due to their black- or gray-box nature. Even in relatively constrained domains like object detection (used for instance, in target recognition), explainability remains highly sensitive to the model architecture, training data, and context of deployment, and often degrades under data shift or domain adaptation.

Similarly, in the domain of drone swarm decision-making under adversarial conditions, where drones must explore environments and dynamically allocate tasks, researchers at the Dutch Ministry of Defense’s Data Science Centre of Excellence face complex, game-theoretical challenges for which no optimal solutions currently exist. We are actively addressing above mentioned challenges in collaboration with a team of researchers, the military, and industry. Thus, developing, validating, maintaining, and deploying robust and operationally viable AI systems is significantly more challenging than current marketing narratives seem to suggest. Achieving this requires sustained technical expertise, rigorous testing, and continuous oversight.

As militaries increasingly turn to industry for AI capabilities, governments should not only guard against the risk of strategic lock-in, but more urgently, against inflated expectations and insufficient scrutiny. These factors can lead to both tactical and strategic failure. AI is not a turnkey solution to military dominance. It is a complex, rapidly evolving capability that demands strategic patience, institutional expertise, rigorous oversight, and a deep understanding of its inner workings. While it may be tempting to channel NATO’s expanding defense budgets into off-the-shelf AI solutions, overreliance on such approaches could become a strategic liability.

This is not to suggest that militaries should avoid collaborating with industry. On the contrary, effective military AI development requires sustained engagement across ecosystems that include industry and civilian academia, particularly in the light of the current scaling challenges and AI’s inherently dual-use nature. However, participation in such ecosystems should not be mistaken for simply procuring off-the-shelf products.

True collaboration demands that the military cultivates robust internal expertise (AI literacy) across all levels of the organization, boosting the human readiness level. This is especially important when evaluating advanced AI platforms, as government officials should be equipped to critically assess their capabilities, limitations, and the conditions under which they can be used responsibly. At the same time, many industry and academic partners — though not all — lack deep operational military insight. Their defense literacy should similarly be strengthened. In this sense, advancing military AI is not just about building better systems, it’s about cultivating better ecosystems.

While industry and academia may drive faster advances in AI technology, the military possesses indispensable domain expertise about the realities and requirements of operational environments. This expertise is essential not only for evaluating how AI systems perform under the complex and often ambiguous conditions of military operations, but also for determining whether their use aligns with principles of responsible and lawful deployment.

Consider, for example, the use of autonomous aerial drone swarms for military intelligence operations. Technologies that enable autonomous collision avoidance and autonomous navigation in resource-scarce environments are undoubtedly important but they only represent one part of the puzzle. What is the real-world performance of object detection models used for target identification under conditions of data drift? What level and type of explainability is required for these models and is understandable for the end-user (and/or commander), given the specific operational and legal context? What constitutes an effective search strategy for a heterogeneous swarm, especially under constraints like limited flight endurance and the potential presence of adversarial countermeasures?

These are not just technical questions but operational ones that require deep military expertise. That expertise is essential not only for conducting applied research but also for formulating the right research questions in the first place. This process often reveals a disconnect between the actual state of the art in AI and the aspirational narratives marketed by industry. Companies tend to sell ideas, prototypes, and concepts — not fully integrated, validated, and mission-ready solutions. This is not inherently problematic, but off-the-shelf ideas are no substitute for informed, mission-driven development. Militaries should not mistake these offerings for complete, turnkey capabilities. As of now, such comprehensive solutions do not exist.

To move toward them, militaries should do more than procure systems: They should build a foundation of expertise that enables meaningful collaboration with external stakeholders. Without that internal capacity, even the best partnerships with industry and academia will fall short.

So, What Now?

First, militaries should invest in building a strong internal knowledge base in data science and AI, one that is firmly rooted in operational realities. This effort should not occur in isolation. Instead, it should take place within vibrant ecosystems that include both industry and civilian academia. We are not advocating for full autonomy from external partners, but we do caution against overreliance and naive procurement.

Second, given the current academic state of the art in AI, defense organizations should resist the urge to accelerate deployment out of fear of falling behind other states. Fear clouds judgment. Militaries have both the opportunity and the responsibility to proceed with deliberation. Rushing into AI adoption without a clear understanding, oversight, and accountability mechanisms risks undermining both mission effectiveness and public trust.

Developing military AI that is effective, robust, and responsible requires the active involvement of military experts, and it takes time. Such development demands a foundational level of AI literacy within the military, and defense literacy within industry and academia. Without both AI and defense literacy, responsibility and effectiveness could be presented as opposites. On the contrary, responsible use of AI is a prerequisite for effective use. In the long run, irresponsible deployment of AI will prove not only unethical but also operationally unsound.

Ultimately, both effective and responsible use of AI depend on a clear and informed understanding of how systems are designed, developed, and deployed. The good news: Investing in AI literacy within the military and defense literacy among external stakeholders serves both aims. True dominance in this new domain will not be won by haste or spectacle, but by those who carry the burden of understanding, and the courage to wield it wisely.

Roy Lindelauf, PhD, is a full professor in data science in military operations at the War Studies Department of the Netherlands Defence Academy. He also holds an endowed chair in data science for safety and security at the Department of Cognitive Science and Artificial Intelligence at Tilburg University. He leads the Data Science Centre of Excellence of the Dutch Department of Defense.

Herwin Meerveld is an officer in the Royal Netherlands Air and Space Force. He is coordinator of the Data Science Centre of Excellence of the Dutch Department of Defense. He is an external PhD candidate at Tilburg University.

Image: Midjourney

Become an Insider

Subscribe to Cogs of War for sharp analysis and grounded insights from technologists, builders, and policymakers.

When the world's at stake, go beyond the headlines.

National security. For insiders. By insiders.

National security. For insiders. By insiders.

Building Trust in Military AI Starts with Opening the Black Box

Cogs of War

Building Trust in Military AI Starts with Opening the Black Box

Become an Insider

When the world's at stake,
go beyond the headlines.