When the world's at stake,
go beyond the headlines.

National security. For insiders. By insiders.

National security. For insiders. By insiders.

Join War on the Rocks and gain access to content trusted by policymakers, military leaders, and strategic thinkers worldwide.

Cogs of War
Cogs of War

The Pentagon’s Software Revolution and Its Testing Dilemma

September 3, 2025
The Pentagon’s Software Revolution and Its Testing Dilemma
Cogs of War

Cogs of War

The Pentagon’s Software Revolution and Its Testing Dilemma

The Pentagon’s Software Revolution and Its Testing Dilemma

Douglas C. Schmidt and Nickolas H. Guertin
September 3, 2025

On a desert airstrip, swarms of autonomous drones rise into the sky — a glimpse of the Pentagon’s vision for future war. The Navy has even shown it can update combat software mid-deployment, underscoring the shift to AI-driven, software-defined arsenals and rapid acquisitions. These advances promise a revolution in U.S. defense, but as innovation accelerates, commanders are left to ask the only question that matters: what will actually work in battle, and what won’t?

For decades, the Office of the Director of Operational Test and Evaluation has been the Pentagon’s safety net, independently reporting results about unproven or ineffective weapons systems. Yet many of its most seasoned civilian testers and support contractors are being shown the door. Racing ahead with innovation while cutting back on oversight risks leaving the joint force with shiny new tools that fail when it matters most. To prevent a brittle foundation, the Defense Department should reinvest in a modernized, tech-augmented test enterprise that pairs emerging tools with hard-won human expertise — ensuring next-generation capabilities are battle-ready before they reach the fight.

We write from long experience at the nexus of software innovation, defense acquisition, and operational testing. We served as the last two directors of operational test and evaluation at the Pentagon, as well as other key leadership roles within the Department of Defense and the defense industrial base. Together, we have seen both the promise of software-defined systems and the risks of fielding them without rigorous testing and evaluation.

A Revolution in How the Pentagon Develops Technology

The U.S. military is undergoing sweeping technological shifts that are reshaping how it develops, buys, and uses its warfighting tools. These changes span multiple domains but are happening all at once. Here are three examples of this revolution.

AI-Augmented Development

Artificial intelligence is already reshaping how the military designs and builds systems. The Army’s experimental “A.I. Flow” program can modernize legacy code and auto-generate acquisition documents in minutes — work that once took weeks. AI-augmented workflows are also exposing hidden dependencies, generating test cases, tracing requirements, and flagging compliance gaps with unprecedented speed.

Software Defined Warfare

Warfare is shifting from static, hardware-centric platforms to agile cyber-physical systems that can be updated at the speed of software. Victory will depend less on a handful of exquisite, costly assets and more on adaptable fleets — vessels, weapons, and swarms of low-cost autonomous systems — that can outpace and overwhelm adversaries. Artificial intelligence and software-defined tactics will deliver both the raw firepower and the agility needed to thrive in a battlespace of constant innovation. Picture drone swarms blinding enemy air defenses or packs of autonomous mini-submarines hunting adversary subs — proof that, in the digital era, quantity gains its edge through code.

Accelerated Acquisition and Agile Procurement

The Defense Department is not only chasing tech breakthroughs but also reinventing how it buys and fields new systems. The old acquisition pipeline — often stretching years from idea to deployment — can’t keep pace with today’s rapid innovation cycle. To close the gap, leaders are turning to the software acquisition pathway, fast-track contracting, and AI-augmented tools that compress timelines from years to months. Yet speed alone isn’t enough. The U.S. military will need to trust that these tools will work reliably in the unforgiving chaos of combat.

Fielding Faster Requires Faster Test Results

As the military transforms, testing should transform as well. Last summer, a Defense Science Board report warned that testing and evaluation now face their toughest hurdle in rapidly advancing technologies, especially those that deliver breakthrough capabilities.

Ensuring defense systems work together in harsh, unpredictable conditions is hard. New platforms have to prove themselves not just individually, but as part of a networked force — making large-scale integrated tests complex, costly, and slow. Instead of relying on massive trials followed by months of fixes, smarter ways are needed to anticipate interactions earlier. War is chaotic enough without learning too late a drone can’t talk to a ship or that AI guidance fails under enemy jamming.

Trimming the Testers

For decades, the Operational Test and Evaluation office was the Pentagon’s independent watchdog, ensuring weapons are tested under realistic combat conditions and judged objectively. Small by Pentagon standards but powerful in mandate, it provided critical oversight of major defense programs to uphold standards. It was a shock when the secretary of defense recently ordered a sweeping restructuring that cut its resources by 80 percent, rolling back reforms and raising urgent questions about independent testing.

The Director’s independent evaluation and annual reports to program offices, the Secretary, and Congress ensures that only weapons proven suitable, effective, and survivable are fielded. Emerging tools — automation, artificial intelligence, modeling, simulation, and digital twins — can further this mission, bridging lab tests and battlefield reality while extending scarce testers and range time. However, adopting automation and artificial intelligence in testing requires both re-engineering systems and recognizing tool limitations, which the defense industry is still mastering. Model-based engineering of complex systems is maturing but already delivers benefits. Virtual testing of “software-defined vehicles” is common in the automotive sector, where companies like Applied Intuition thrive with this paradigm.

The defense context is different. Automobiles, while complex, face a bounded integration challenge compared to the massive, multi-domain systems-of-systems needed for joint warfighting. Defense platforms should integrate across contractors, technologies, and mission profiles while adapting to adversaries in unpredictable conditions. Some projects use automated testing successfully, but few demonstrate true operational testing in dynamic battlespace conditions using model-based methods.

Software-defined platforms require new testing approaches to manage complexity, shifting requirements, and massive integration. Algorithms may assist but cannot replace seasoned human judgment. Scaling prototypes into battle-ready systems requires major investment, collaboration, and time. Cutting human testers before reliable automated tools exist risks leaving the joint force with neither people nor technology to evaluate next-generation AI-enabled systems.

Does Speed Kill?

Rushing into high-tech war without understanding system limits risks failed missions and lost lives. America’s edge lies in doctrine and technology — networking, precision, and coordination to offset larger foes. However, that advantage disappears the moment systems falter or adversaries counter them.

Picture a mission collapsing because drone X can’t connect through network Y under condition Z — a gap missed in testing. In that instant, an entire “kill web” of sensors and shooters could unravel, turning a strategy to exploit an opponent’s weakness into defeat. Without rigorous, end-to-end validation across services in realistic conditions, small mismatches can snowball into catastrophic failures. The Pentagon may be fielding new tools quickly, but proving their interoperability at scale remains unfinished — and urgent.

Another risk is false confidence, or believing a weapon is a game-changer when it hides an Achilles’ heel only testing can reveal. A flashy demo or glowing contractor report can make a system look ready, yet independent operational testing has repeatedly exposed critical flaws others missed. Without a dedicated test office to run those trials, such flaws may surface only in combat.

There’s also a paradox: the speed of trust. Cutting testing can actually slow down the adoption of new technology. Commanders hesitate to rely on systems whose limits and quirks remain unknown. Well-tested programs provide clear reports on what worked, what failed, and what to watch, which builds trust and sets boundaries for use in battle. Without that vetting, doubts linger on the front lines.

Trust takes on sharper stakes with artificial intelligence. Unlike traditional systems, AI can behave unpredictably, shift as data changes, or deliver confident but wrong outputs. Without rigorous, transparent testing, users may distrust AI even when it works or over-trust it when they shouldn’t. Building trust requires proving performance under realistic conditions, explaining how models reach conclusions, exposing failure modes, and setting clear boundaries. Only then will commanders know when to lean on AI, and when human judgment ought to prevail.

Safeguarding the Revolution

All of this raises a critical question: how can the Pentagon drive its high-tech revolution while ensuring new capabilities are proven, safe, and integrated? With the right investments and discipline, it’s possible to achieve both — rapid innovation paired with rigorous validation.

While leading the Defense Departments test and evaluation enterprise, we advanced strategies to embrace new technology, shift toward integrated and continuous testing, incrementally improve software-driven systems, and invest in automation. The downsized test office should now build on those foundations — and create new ones — to transform oversight. Success will require sustained investment in near-term payoffs, with research agencies and labs focused on moving promising tools from prototype to deployment-ready status.

Advancing test technology will not be easy or cheap. Automation should also be pursued responsibly if it’s both the coping strategy for fewer people and part of the solution. Testing tools themselves should be validated extensively. No black-box algorithm should certify a weapon without human judgment.

The recent downsizing drained much of the Pentagon’s testing expertise. Preserving what remains will require deliberate collaboration across agencies, services, academia, and industry. Cross-agency task forces and expert forums could focus on priority systems, share best practices, and capture lessons learned so each new AI-augmented program doesn’t have to reinvent testing from scratch.

Artificial intelligence itself could help through intelligent test orchestration. It might coordinate automated test agents across distributed labs and facilities, enabling large-scale, multi-system scenarios that would be impossible manually. With new approaches, the slimmed-down testing community could reimagine how to achieve comprehensive testing in new ways, with help from industry and the academy.

Modern software thrives on continuous integration, testing each change as it’s made. The military could adopt a similar approach for hardware and artificial intelligence: field incremental updates, let operators test them against AI-generated scenarios, and carefully deploy prototypes alongside proven systems. Frontline units would become living testbeds — learning by doing, but always in a controlled and reversible way.

The Pentagon’s digital revolution and its testing crisis are two sides of the same coin. As autonomous systems, AI-driven battle management, and networked weapons advance, the question becomes: Who tests the testers? Who ensures innovations work as intended and that flaws are exposed before lives are at stake?

Ultimately, men and women in uniform should be able to fight knowing they can depend on their gear. To ensure that, the Pentagon needs to rebuild a testing safety net, or risk a military revolution built on a brittle foundation. The future of American warfare may hinge not only on how fast industry and government can innovate, but on how well results are tested and proven.

 

Douglas C. Schmidt is the dean of William & Mary’s School of Computing, Data Sciences & Physics, after spending over 20 years as a computer science professor at Vanderbilt University. He recently served as the Pentagon’s director of operational test and evaluation. He was a program manager at the Defense Advanced Research Projects Agency and the chief technology officer at Carnegie Mellon University’s Software Engineering Institute. He has published many papers and books on software-related topics.

Nickolas H. Guertin has 30+ years of leadership in defense acquisition and testing. He served as assistant secretary of the Navy for research, development, and acquisition, overseeing a $130 billion portfolio and 130,000 personnel. Before that he served as the Pentagon’s director of operational test and evaluation. A retired Navy civilian and submariner, he is recognized nationally for advancing acquisition strategy, open-systems architecture, and operational testing, and continues shaping defense innovation through research and advisory roles.

Image: Morgan Brown via DVIDS.

Become an Insider

Subscribe to Cogs of War for sharp analysis and grounded insights from technologists, builders, and policymakers.