Data Incoming: How to Close the Cyber Data Gap

Cyber Data Security

How many cyber attacks happen in the United States each year? How many Americans are affected? Right now, we have little idea about answers to basic questions such as these. That is about to change. With a flurry of new regulations over the last year, U.S. companies will now be subject to expanded cyber-incident reporting requirements mandating them to submit information to the federal government about cyber attacks, data breaches, and other occurrences that compromise information systems. As such, the government is about to gain a valuable resource that offers unprecedented insight into the state of cyber security in the United States. Agencies receiving these incident reports should scale up data analysis and expand private sector and international cooperation to take advantage of this data.

The Cyber Security Data Gap

The U.S. government provides little publicly available large-scale data or data analysis regarding cyber security. The Federal Bureau of Investigation and the Office of Management and Budget issue annual reports on certain cyber incidents, and the Treasury Department has published analysis of cyber incidents targeting banking and financial institutions. The Department of Health and Human Services, Securities and Exchange Commission, Department of Energy, and some state governments also release varying amounts of cyber incident or breach data reported to them. While this data is useful, it represents only a portion of what the government collects now and will begin to collect under new requirements. There is more the government can do to make available and provide analysis of the data it receives.

Industry and academia have also been unable to fill the gap. Academic researchers generally study cyber incidents reported in the press, but much of cyber conflict remains covert or is never publicly reported. Companies providing cyber security services have a wealth of incident data, and insurance companies gather details from cyber-related claims. However, this data is proprietary, which has limited its use and access.



The paucity of cyber data is a relative anomaly compared with data on safety risks, hazards, and incidents in other industries and professions. There are 13 statistical agencies and more than a hundred additional statistical programs spread across the federal government. These entities cover labor, justice, transportation, economics, and health, among other areas. The federal government also collects data about aircraft incidents, highway safety, hospital injuries and deaths, and workplace injuries. This information is meant to improve public and consumer safety. Collecting cyber data would have the same purpose, with the added urgency that such data could reveal national security threats. It is standard practice in other fields for the U.S. government to collect, analyze, and release safety-related information: Cyber security is the exception. 

Cyber data is important for many reasons. A fuller data set would allow the U.S. government to prioritize threats, allocate resources to policy efforts, and measure the success of those efforts. It would also help organizations to understand the scale and scope of risks they face and invest in cyber resilience measures. Insurance companies could use cyber data to develop better risk models to set rates. Finally, the data could illuminate which cyber security practices are associated with greater resilience or faster recovery from attacks, providing organizations with best practices.

Policymakers and experts generally agree that there is a serious deficiency in cyber security metrics. The White House National Cyber Director Chris Inglis stated that without data collection “we are going to be uneven, and perhaps less-than-optimal in our response to any of these threats.” Two scholars of cyber conflict wrote in 2018 that without good cyber-incident reporting the United States is “operating in [a] known environment needlessly wearing a blindfold.” Things have improved since 2018, but there is still room for further progress. 

To address this, in 2020 the congressionally mandated Cyberspace Solarium Commission proposed forming a Bureau of Cyber Statistics. Some legislation has been introduced that sought to advance this, including most recently a proposed amendment to this year’s National Defense Authorization Act. However, until now, these legislative efforts have not succeeded.

In the meantime, there has been a flurry of laws and regulations about cyber-incident reporting. These are probably at least partly spurred by a series of major cyber incidents targeting U.S. infrastructure in 2021 as well as fears of a Russian cyber attack against the United States following Russia’s invasion of Ukraine. While such incident reporting requirements would not take the place of a longer-term effort to establish a cyber statistics entity, these requirements would be a good start to advancing the mission that this office would ultimately undertake.

Current Cyber Data Landscape

The most significant source of incident-reporting data that the federal government can use to fill the cyber data gap is the information that companies will submit under the Cyber Incident Reporting for Critical Infrastructure Act, which was passed into law in March. The details of what companies and incidents the law covers are still being worked out, but the law will probably be the most comprehensive cyber-incident reporting scheme the country has. 

However, it could take two or more years until the law goes into effect. Until then, the federal government should use the incident reports submitted under other federal and state requirements to generate useful statistics. At the federal level, there are over 20 regulations and laws requiring incident reporting. These requirements have increased in the last year: In addition to the Cyber Incident Reporting for Critical Infrastructure Act, the Federal Communications Commission, the Securities and Exchange Commission, and the National Credit Union Association are exploring increased incident-reporting requirements, and the Transportation Security Agency and banking regulators have already successfully expanded requirements. Additionally, four states recently enacted cyber-incident reporting requirements and two have imposed ransomware laws requiring state and local governments to report ransomware incidents.

Federal agencies should synthesize this government incident-reporting data with industry data to build a more holistic and detailed picture of the cyber-threat landscape. Several companies, such as Verizon and Advisen, maintain databases of tens of thousands of cyber incidents. In addition, the government should request anonymized data from insurance and cyber security companies.

Finally, think tanks and researchers have created datasets of publicly reported cyber incidents that the government could use to complement incident-reporting data. The Council on Foreign Relations, the Center for Strategic and International Studies, and the University of Maryland have maintained such databases dating back to 2005, 2006, and 2014, respectively. The CyberPeace Institute has likewise created more narrowly focused databases of cyber incidents regarding the Ukraine invasion and the healthcare sector.

U.S. cyber-incident data would be even more valuable if combined with similar data from other countries. Some countries, like New Zealand, Australia, Japan, Estonia, and Singapore are already collecting and publishing cyber-incident data. The United Kingdom, Israel, and Canada also survey businesses regarding cyber security issues and publish the results. Canada and the European Union are considering requiring broader cyber-incident reporting, and India recently passed such a law. The U.S. government should establish data-sharing agreements and coordinate with partners and allies to release and share consistent data across countries.

Finally, the U.S. government could also conduct a national survey of cyber security practitioners in private sector organizations. The survey could be modeled on those conducted by Canada, Israel, and the United Kingdom, and ask practitioners about cyber threats they face or anticipate as well as cyber security practices they adopt. 

Promoting Data Standards

To promote robust, consistent data, the U.S. government should establish voluntary standards for industry and international partners and allies regarding measuring and collecting information about cyber threats and cyber security. For instance, countries could agree on a common set of data fields in cyber-incident reporting forms and a shared definition of a “cyber incident.” There is precedent for international standards of measurement: International organizations have developed a System of National Accounts that lays out standards of data collection and measurement to inform countries’ calculation of GDP. Cyber should have a similar set of standards.

The U.S. government should also provide capacity-building support and technical expertise to other countries and U.S. state and local governments to help them collect and make use of cyber-incident data. The federal government should offer data science training and expertise to help analyze trends in and conduct studies about cyber data. The federal government should also provide advisory support for countries developing cyber-incident reporting laws and regulations.

How to Analyze Cyber Data

Federal agencies could use the totality of cyber data to answer important policy and research questions about cyber security. The most basic use of such cyber data would be to paint a holistic picture of the cyber-threat landscape. In other words, how many cyber attacks happen and what is their effect on U.S. infrastructure, people, and the economy? 

While it may sound rudimentary, we have little idea about answers to these questions. Take ransomware as an example. In the last year, various private sector organizations simultaneously found that ransomware incidents were either increasing, staying the same, or decreasing. Different private sector reports also provide conflicting findings regarding the sectors most targeted by cyber attacks.

The U.S. government could act as an arbiter, compiling and synthesizing all available data sources to provide authoritative assessments of cyber threats. U.S. cyber data, when pooled with other countries’ data, could also provide an international view of cyber threats and capabilities, which would allow researchers to better compare and rank countries’ abilities. While neither the government nor industry can ever be sure of the total number of cyber incidents, the government can play a role in dramatically improving visibility.

Two recent reports on the number of ransomware attacks can serve as models. The E.U. Agency for Cybersecurity recently released a publication that combined ransomware incidents reported to governments with those reported by the press and private sector to develop analysis of overall ransomware trends. The Institute for Security and Technology published a similar report by synthesizing data from five private companies. The U.S. government could base its own analysis off these models by combining incident-reporting data with various other sources to develop an overall understanding of threats.

Researchers could also use cyber-incident data to address the cost of cyber attacks to the U.S. economy — and the money saved by investing in cyber security. Various organizations including the Center for Strategic and International Studies, the International Business Machines Corporation, and the White House Council of Economic Advisors have attempted to quantify the losses stemming from cyber incidents. These estimates vary widely, partly because they rely on different methodologies and partly due to the lack of comprehensive data. Additionally, researchers have sought to determine the economic returns on firms’ cyber security investments. A better understanding of the economics of cyber security could help firms to invest appropriately in cyber-resilience measures. 

Cyber-incident data could also enable research on the impacts of cyber attacks on specific types of organizations and sectors to help direct resources to the most pressing risks. For instance, cyber-incident data could help to uncover the types of organizations and infrastructure most at risk of compromise. In the past, researchers have used the Department of Health and Human Services’ data breach database to determine how hospital size and teaching status affect the likelihood of a breach occurring, and how the type of data and organization targeted in healthcare breaches have evolved over time. 

There are likely many other ways in which cyber-incident data could be used in academic research — especially if combined with private sector and open-source data. It will be hard to anticipate and plan for all of these uses in advance, which is why the government should make the data as widely available as possible in accordance with confidentiality and privacy restrictions. Since not all data will be shareable with the public, the government should also hire researchers in a variety of fields to produce in-house analysis.


The federal government is about to receive an influx of new cyber-incident data. To capitalize on this resource, agencies should invest in analysis, sharing, and public release of the data and synthesize it with all other available data about cyber security. If the government succeeds, it will develop the most comprehensive picture the United States has ever had regarding cyber security and cyber threats. A clearer view of this landscape, in turn, is a key step in improving cyber policy and reducing cyber attacks against U.S. infrastructure.



Jennifer Shore is a graduate student at Princeton Universitys School of Public and International Affairs. She was previously a fellow at the White House National Economic Council during the Obama administration.

Image: U.S. Army