Download high-resolution image
Listen to a clip from the audiobook
audio play button
0:00
0:00

The Declassification Engine

What History Reveals About America's Top Secrets

Listen to a clip from the audiobook
audio play button
0:00
0:00
Audiobook Download
On sale Feb 14, 2023 | 15 Hours and 56 Minutes | 978-0-593-62864-5
SHORTLISTED FOR THE CUNDHILL HISTORY PRIZE • Every day, thousands of new secrets are created by the United States government. What is all this secrecy really for? And whom does it benefit?

“A brilliant, deeply unsettling look at the history and inner workings of ‘the dark state'.... At a time when federal agencies are increasingly classifying or destroying documents with historical significance, this book could not be more important.” —Eric Schlosser, New York Times best-selling author of Command and Control


Before World War II, transparent government was a proud tradition in the United States. In all but the most serious of circumstances, classification, covert operations, and spying were considered deeply un-American. But after the war, the power to decide what could be kept secret proved too tempting to give up. Since then, we have radically departed from that open tradition, allowing intelligence agencies, black sites, and classified laboratories to grow unchecked. Officials insist that only secrecy can keep us safe, but its true costs have gone unacknowledged for too long.

Using the latest techniques in data science, historian Matthew Connelly analyzes a vast trove of state secrets to unearth not only what the government really did not want us to know but also why they didn’t want us to know it. Culling this research and carefully examining a series of pivotal moments in recent history, from Pearl Harbor to drone warfare, Connelly sheds light on the drivers of state secrecy— especially incompetence and criminality—and how rampant overclassification makes it impossible to protect truly vital information.

What results is an astonishing study of power: of the greed it enables, of the negligence it protects, and of what we lose as citizens when our leaders cannot be held to account. A crucial examination of the self-defeating nature of secrecy and the dire state of our nation’s archives, The Declassification Engine is a powerful reminder of the importance of preserving the past so that we may secure our future.
PREFACE: Should This Book Be Legal?
 
There I was, sitting at a massive conference table inside a multibillion-dollar foundation, staring at the wood-paneled walls. I was facing a battery of high-powered attorneys, including the former general counsel to the National Security Agency, and another who had been chief of the Major Crimes Unit at the U.S. Attorney’s Office in the Southern District of New York. The foundation was paying each of them about a thousand dollars an hour to determine whether I could be prosecuted under the Espionage Act.
 
I am a history professor, and my only offense had been to apply for a research grant. I proposed to team up with data scientists at Columbia University to investigate the exponential growth in government secrecy. Earlier that year, in 2013, officials reported that they had classified information more than ninetyfive million times over the preceding twelve months, or three times every second. Every time one of these officials decided that some transcript, or e-mail, or PowerPoint presentation was “confidential,” “secret,” or “top secret,” it became subject to elaborate protocols to ensure safe handling. No one without a security clearance would see these records until, decades from now, other government officials decided disclosure no longer endangered national security. The cost of keeping all these secrets was growing year by year, covering everything from retinal scanners to barbed-wire fencing to personnel training programs, and already totaled well over eleven billion dollars. But so, too, were the number and size of data breaches and leaks. At the same time, archivists were overwhelmed by the challenge of managing just the first generation of classified electronic records, dating to the 1970s. Charged with identifying and preserving the subset of public records with enduring historical significance but with no increase in staff or any new technology, they were recommending the deletion of hundreds of thousands of State Department cables, memoranda, and reports, sight unseen. The costs in terms of democratic accountability were incalculable and included the loss of public confidence in political institutions, the proliferation of conspiracy theories, and the increasing difficulty historians would have in reconstructing what our leaders do under the cloak of secrecy.
 
We wanted to assemble a database of declassified documents and use algorithms to reveal patterns and anomalies in the way bureaucrats decide what information must be kept secret and what information can be released. To what extent were these decisions balanced and rule-based, as official spokesmen have long claimed? Were they consistent with federal laws and executive orders requiring the preservation of public records, and prompt disclosure when possible? Were the exceptions so numerous as to prove the existence of unwritten rules that really served the interests of a “deep state”? Or was the whole system so dysfunctional as to be random and inexplicable, as other critics insist?
 
We were trying to determine whether we could reverse engineer these processes, and develop technology that could help identify truly sensitive information. If we assembled millions of documents in databases, and harnessed the power of high-performance computing clusters, it might be possible to train algorithms to look for sensitive records requiring the closest scrutiny and accelerate the release of everything else. The promise was to make the crucial but dysfunctional declassification process more equitable and far more efficient. We had begun to call it a “declassification engine,” and if someone did not start building and testing prototypes, the exponential increase in government secrets—more and more of them consisting of data rather than paper documents—might make it impossible for public officials to meet their own legal responsibilities to maximize transparency. Even if we failed to get the government to adopt this kind of technology, testing these tools and techniques would reveal gaps and distortions in the public record, whether from official secrecy or archival destruction.
 
The lawyers in front of me started to discuss the worst-case scenarios, and the officers of the foundation grew visibly uncomfortable. What if my team was able to reveal the identity of covert operatives? What if we uncovered information that would help someone build a nuclear weapon? If the foundation gave us the money, their lawyers warned that the foundation staff might be prosecuted for aiding and abetting a criminal conspiracy. Why, the most senior program officer asked, should they help us build “a tool that is purpose-built to break the law”? The only one who did not seem nervous was the former ACLU lawyer whom Columbia had hired to represent us. He had argued cases before the Supreme Court. He had defended people who published schematics of nuclear weapons—and won. He had shown how any successful prosecution required proving that someone had possession of actual classified information. How could the government go after scholars doing research on declassified documents?
 
The ex–government lawyers pointed out that we were not just academics making educated guesses about state secrets—not when we were using high-performance computers and sophisticated algorithms. True, no journalist, no historian, can absorb hundreds of thousands of documents, analyze all of the words in them, instantly recall every one, and rank each according to one or multiple criteria. But scientists and engineers can turn millions of documents into billions of data points and use machine learning—or teaching a computer to teach itself—to detect patterns and make predictions. We agree with these predictions every time we watch a movie Netflix recommends, or buy a book that Amazon suggests. If we threw enough data at the problem of parsing redacted documents—the ones in which government officials have covered up the parts they do not want us to see— couldn’t these techniques “recommend” the words most likely to be hiding behind the black boxes, which presumably were hidden for good reason?
  • LONGLIST | 2023
    Cundill History Prize
© Andrew Steinman
MATTHEW CONNELLY is a professor of international and global history at Columbia University, codirector of its social science institute, and the principal investigator at History Lab, a project to apply data science to the problem of preserving the public record and accelerating its release. He received his BA from Columbia and his PhD from Yale. His previous publications include A Diplomatic Revolution: Algeria’s Fight for Independence and the Origins of the Post–Cold War Era and Fatal Misconception: The Struggle to Control World Population. View titles by Matthew Connelly

About

SHORTLISTED FOR THE CUNDHILL HISTORY PRIZE • Every day, thousands of new secrets are created by the United States government. What is all this secrecy really for? And whom does it benefit?

“A brilliant, deeply unsettling look at the history and inner workings of ‘the dark state'.... At a time when federal agencies are increasingly classifying or destroying documents with historical significance, this book could not be more important.” —Eric Schlosser, New York Times best-selling author of Command and Control


Before World War II, transparent government was a proud tradition in the United States. In all but the most serious of circumstances, classification, covert operations, and spying were considered deeply un-American. But after the war, the power to decide what could be kept secret proved too tempting to give up. Since then, we have radically departed from that open tradition, allowing intelligence agencies, black sites, and classified laboratories to grow unchecked. Officials insist that only secrecy can keep us safe, but its true costs have gone unacknowledged for too long.

Using the latest techniques in data science, historian Matthew Connelly analyzes a vast trove of state secrets to unearth not only what the government really did not want us to know but also why they didn’t want us to know it. Culling this research and carefully examining a series of pivotal moments in recent history, from Pearl Harbor to drone warfare, Connelly sheds light on the drivers of state secrecy— especially incompetence and criminality—and how rampant overclassification makes it impossible to protect truly vital information.

What results is an astonishing study of power: of the greed it enables, of the negligence it protects, and of what we lose as citizens when our leaders cannot be held to account. A crucial examination of the self-defeating nature of secrecy and the dire state of our nation’s archives, The Declassification Engine is a powerful reminder of the importance of preserving the past so that we may secure our future.

Excerpt

PREFACE: Should This Book Be Legal?
 
There I was, sitting at a massive conference table inside a multibillion-dollar foundation, staring at the wood-paneled walls. I was facing a battery of high-powered attorneys, including the former general counsel to the National Security Agency, and another who had been chief of the Major Crimes Unit at the U.S. Attorney’s Office in the Southern District of New York. The foundation was paying each of them about a thousand dollars an hour to determine whether I could be prosecuted under the Espionage Act.
 
I am a history professor, and my only offense had been to apply for a research grant. I proposed to team up with data scientists at Columbia University to investigate the exponential growth in government secrecy. Earlier that year, in 2013, officials reported that they had classified information more than ninetyfive million times over the preceding twelve months, or three times every second. Every time one of these officials decided that some transcript, or e-mail, or PowerPoint presentation was “confidential,” “secret,” or “top secret,” it became subject to elaborate protocols to ensure safe handling. No one without a security clearance would see these records until, decades from now, other government officials decided disclosure no longer endangered national security. The cost of keeping all these secrets was growing year by year, covering everything from retinal scanners to barbed-wire fencing to personnel training programs, and already totaled well over eleven billion dollars. But so, too, were the number and size of data breaches and leaks. At the same time, archivists were overwhelmed by the challenge of managing just the first generation of classified electronic records, dating to the 1970s. Charged with identifying and preserving the subset of public records with enduring historical significance but with no increase in staff or any new technology, they were recommending the deletion of hundreds of thousands of State Department cables, memoranda, and reports, sight unseen. The costs in terms of democratic accountability were incalculable and included the loss of public confidence in political institutions, the proliferation of conspiracy theories, and the increasing difficulty historians would have in reconstructing what our leaders do under the cloak of secrecy.
 
We wanted to assemble a database of declassified documents and use algorithms to reveal patterns and anomalies in the way bureaucrats decide what information must be kept secret and what information can be released. To what extent were these decisions balanced and rule-based, as official spokesmen have long claimed? Were they consistent with federal laws and executive orders requiring the preservation of public records, and prompt disclosure when possible? Were the exceptions so numerous as to prove the existence of unwritten rules that really served the interests of a “deep state”? Or was the whole system so dysfunctional as to be random and inexplicable, as other critics insist?
 
We were trying to determine whether we could reverse engineer these processes, and develop technology that could help identify truly sensitive information. If we assembled millions of documents in databases, and harnessed the power of high-performance computing clusters, it might be possible to train algorithms to look for sensitive records requiring the closest scrutiny and accelerate the release of everything else. The promise was to make the crucial but dysfunctional declassification process more equitable and far more efficient. We had begun to call it a “declassification engine,” and if someone did not start building and testing prototypes, the exponential increase in government secrets—more and more of them consisting of data rather than paper documents—might make it impossible for public officials to meet their own legal responsibilities to maximize transparency. Even if we failed to get the government to adopt this kind of technology, testing these tools and techniques would reveal gaps and distortions in the public record, whether from official secrecy or archival destruction.
 
The lawyers in front of me started to discuss the worst-case scenarios, and the officers of the foundation grew visibly uncomfortable. What if my team was able to reveal the identity of covert operatives? What if we uncovered information that would help someone build a nuclear weapon? If the foundation gave us the money, their lawyers warned that the foundation staff might be prosecuted for aiding and abetting a criminal conspiracy. Why, the most senior program officer asked, should they help us build “a tool that is purpose-built to break the law”? The only one who did not seem nervous was the former ACLU lawyer whom Columbia had hired to represent us. He had argued cases before the Supreme Court. He had defended people who published schematics of nuclear weapons—and won. He had shown how any successful prosecution required proving that someone had possession of actual classified information. How could the government go after scholars doing research on declassified documents?
 
The ex–government lawyers pointed out that we were not just academics making educated guesses about state secrets—not when we were using high-performance computers and sophisticated algorithms. True, no journalist, no historian, can absorb hundreds of thousands of documents, analyze all of the words in them, instantly recall every one, and rank each according to one or multiple criteria. But scientists and engineers can turn millions of documents into billions of data points and use machine learning—or teaching a computer to teach itself—to detect patterns and make predictions. We agree with these predictions every time we watch a movie Netflix recommends, or buy a book that Amazon suggests. If we threw enough data at the problem of parsing redacted documents—the ones in which government officials have covered up the parts they do not want us to see— couldn’t these techniques “recommend” the words most likely to be hiding behind the black boxes, which presumably were hidden for good reason?

Awards

  • LONGLIST | 2023
    Cundill History Prize

Author

© Andrew Steinman
MATTHEW CONNELLY is a professor of international and global history at Columbia University, codirector of its social science institute, and the principal investigator at History Lab, a project to apply data science to the problem of preserving the public record and accelerating its release. He received his BA from Columbia and his PhD from Yale. His previous publications include A Diplomatic Revolution: Algeria’s Fight for Independence and the Origins of the Post–Cold War Era and Fatal Misconception: The Struggle to Control World Population. View titles by Matthew Connelly