The Washington PostDemocracy Dies in Darkness

Facebook takes down data and thousands of posts, obscuring reach of Russian disinformation

October 12, 2017 at 11:42 a.m. EDT
Jonathan Nackstrand/AFP/Getty Images

Social media analyst Jonathan Albright got a call from Facebook the day after he published research last week showing that the reach of the Russian disinformation campaign was almost certainly larger than the company had disclosed. While the company had said 10 million people read Russian-bought ads, Albright had data suggesting that the audience was at least double that — and maybe much more — if ordinary free Facebook posts were measured as well.

Albright welcomed the chat with three company officials. But he was not pleased to discover that they had done more than talk about their concerns regarding his research. They also had scrubbed from the Internet nearly everything — thousands of Facebook posts and the related data — that had made the work possible.

Never again would he or any other researcher be able to run the kind of analysis he had done just days earlier.

“This is public interest data,” Albright said Wednesday, expressing frustration that such a rich trove of information had disappeared — or at least moved somewhere the public can’t see it. “This data allowed us to at least reconstruct some of the pieces of the puzzle. Not everything, but it allowed us to make sense of some of this thing.”

Facebook does not dispute it removed the posts, but it offers a different explanation of what happened. The company says it has merely corrected a “bug” that allowed Albright, who is research director of the Tow Center for Digital Journalism at Columbia University, to access information he never should have been able to find in the first place. That bug, Facebook says, has now been squashed on a social media analytics tool called CrowdTangle, which Facebook bought last year.

CrowdTangle enables advertisers to view metrics about the performance of their Facebook and Instagram campaigns, such as how many times a post was liked, commented, or shared. Until this week, advertisers were able to see metrics for content that had already been taken down on Facebook and Instagram.

“We identified and fixed a bug in CrowdTangle that allowed users to see cached information from inactive Facebook Pages,” said company spokesman Andy Stone. “Across all our platforms we have privacy commitments to make inactive content that is no longer available, inaccessible.”

Whatever the reason, researchers expressed frustration that crucial data and thousands of posts are now gone.

Last week, two other researchers who had been working with the Facebook data, Joan Donovan and Becca Lewis of the nonprofit Data and Society Institute, also noticed that it had suddenly disappeared.

“When platforms do not release data for researchers to analyze, they set themselves up for drawing their own conclusions based on their own interests,” said Donovan, who has used Facebook data for the last eight years to study how influence campaigns on social media impact participation in political movements. “The bits and pieces of data we found in CrowdTangle are alarming because of who the Facebook pages target — everyday people with sets of mutual concerns about the future of our society."

Albright's research began when he tried to determine how far the Russian disinformation campaign reached during the campaign, an elusive question that many others had grappled with. He knew that Facebook had acknowledged some basic numbers regarding the Russian effort, specifically that the company had shut down 470 Russian-controlled pages and accounts that had bought more than 3,000 ads, and that those ads had reached an estimated 10 million people, based on the company’s own modeling. Facebook has declined to say how many people saw the free posts created by the Russian accounts and pages.

That left open the question of what else those 470 accounts and pages had been doing. Helpfully six of them — Blacktivists, United Muslims of America, Being Patriotic, Heart of Texas, Secured Borders and LGBT United — had become publicly known through various news reports. So Albright decided to deploy analytics tools to answer his question.

Because Internet data is riddled with imprecision — thanks to bots, trolls and vague definitions used by those who create the metrics — Albright used CrowdTangle. Its connection with Facebook would make it harder for the company to later disavow his findings, he hoped. Albright also copied CrowdTangle’s definition of its metrics directly into the spreadsheet he was building and soon would make public.

The results of his data download startled him. For those six pages alone, Albright found 19.1 million “interactions,” a term describing how often a Facebook user does something concrete with a post, such as sharing it, commenting on it, hitting the “like” button or posting an emoji. Given that somebody who acts on a post surely has also glanced at it, this measurement is a subset of how many times somebody had seen these posts.

Albright also found that, according to CrowdTangle, this same content had been “shared” 340 million times. That meant that the disinformation could have potentially reached the feeds of users that many times, but it didn’t reveal how many users had actually read or even seen it. If “interactions” was the floor of the possible reach of the Russian content, then “shared” was the theoretical ceiling.

But given that Albright was working with just six pages out of 470, it was clear to him that the Russian campaign reached far beyond the 10 million people Facebook had acknowledged saw the ads alone. He even discovered a single Russian-backed Instagram account associated with LGBT United that had reached nearly 10 million on its own.

Finally, for each of the six pages, Albright downloaded 500 posts — the most available through CrowdTangle — and published them online in a handy, visual format last Thursday. Suddenly, anybody with an Internet connection could see the numbers Albright had compiled and the 3,000 free Facebook posts that he had downloaded.

That work soon became the basis for a Washington Post article on Albright’s discoveries and a later story in the New York Times as well. Albright also talked about his research on CBS News and Fox News.

But even as the discussion over Albright's research began heating  up, he discovered that the 3,000 Facebook posts were gone, as was the data on CrowdTangle.

“There was nothing,” he said. “It was wiped.”

The analytics company also tweaked its own description of one of its metrics. The “shared” metric was now recast as “total followers.”

The last change was not objectionable to Albright, who had agreed with the Facebook representatives who called him — including CrowdTangle chief executive Brandon Silverman — that the “shared” metric from CrowdTangle was poorly named given that it didn’t fit with how Facebook itself uses the term “shared.”

But the deletion of the posts and the related data struck Albright as a major loss for the world’s understanding of the Russian campaign. He still has the data and the posts for the six pages he examined, but as others become public, there will be no way for independent researchers or journalists to conduct a similar examination of any of the other 470 pages and accounts — or any others linked to Russia that may emerge over subsequent weeks or months.

Every bit of data that gets deleted also makes it harder to study how content flowed back and forth across platforms, including Twitter, Google, Instagram, Pinterest and more, several researchers said.

“We can see by looking across the Internet that these posts have been shared on every other platform,” Albright said.

The discomfort is shared even by a critic of Albright's work, George Washington University professor David Karpf, who published a piece in The Post's Monkey Cage blog Thursday arguing that claims about the reach of the free Facebook's posts were overblown because, among other reasons, all that clicking and sharing could have been the work of Russian trolls. The CrowdTangle data, Karpf argued, is a weak proxy for the most important questions about how many American voters saw the content and how it affected their political choices.

Yet even so, Karpf was unhappy to learn of Facebook's removal of the posts, given the public debate underway.

“Any time you lose data,” he said, “I don't like it, especially when you lose data and you're right in the middle of public scrutiny.”