The past few months witnessed several discussions about the what was called in the speeches of the United States President Donald Trump phenomena of ‘fake news’, and amid that climate, the fact-checking world has been in a relative crisis.
Sites such as PolitiFact and Snopes have traditionally focused on specific claims, which is admirable but tedious, as by the time they have gotten through verifying or debunking a fact, there is a good chance it has already travelled across the globe and back again.
Social media companies have also had mixed results to limit the spread of propaganda and misinformation. For example, Facebook plans to have 20,000 human moderators by the end of the year, and is spending many millions developing its own fake-news-detecting algorithms.
About two months ago, Egypt’s President Abdel Fattah Al-Sisi said that the country is facing huge number of rumours. The president clarified that Egypt has faced 21,000 rumours in just three months. Al-Sisi’s statements were followed by a massive media campaign against fake news.
A collaborative recent project by researchers from Massachusetts Institute of Technology’s (MIT) Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute (QCRI) revealed that the best approach is to focus not on the factuality of individual claims, but on the news sources themselves. Using this technology, they’ve demonstrated a new system which uses machine learning, to determine if a source is accurate, or politically-biased.
Ramy Baly, postdoctoral associate and lead author of a new paper in the project said “if a website has published fake news before, there’s a good chance they’ll do it again.” He added “by automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.” Baly also added that the system needs only about 150 articles to reliably detect if a news source can be trusted—meaning that an approach like theirs could be used to help stamp out fake-news outlets before the stories spread too widely.
According a statement from MIT’s CSAIL, the system is a collaboration between computer scientists at MIT CSAIL and QCRI, which is part of the Hamad Bin Khalifa University in Qatar. Researchers first took data from the Media Bias-Fact Check (MBFC), a website with human fact-checkers who analyse the accuracy and biases of over 2,000 news sites, from MSNBC and Fox News, to low-traffic content farms.
Following that, they fed that data to a machine learning algorithm called a Support Vector Machine classifier, and programmed it to classify news sites the same way as the MBFC. When given a new news-outlet, the system was then 65 % accurate at detecting whether it has a high, low, or medium level of ‘factuality,’ and roughly 70 % accurate at detecting if it is left-leaning, right-leaning, or moderate.
The team determined that the most reliable ways to detect both fake news, and biased reporting were to look at the common linguistic features across the source’s stories, including sentiment, complexity, and structure.
For example, fake-news outlets were found to be more likely to use language that is hyperbolic, subjective, and emotional. In terms of bias, left-leaning outlets were more likely to have language that related to concepts of harm or care, and fairness or reciprocity, compared to other qualities such as loyalty, authority, and sanctity. These qualities represent the five ‘moral foundations,’ a popular theory in social psychology.
Co-author of the project, Preslav Nakov, said that the system also found correlations with an outlet’s Wikipedia page, which it assessed for general length— longer is more credible—as well as key words such as ‘extreme,’ or ‘conspiracy theory.’ It even found correlations with the text structure of a source’s URLs: those that had several special characters and complicated subdirectories, for example, were associated with less reliable sources.
“Since it is much easier to obtain ground truth on sources [than on articles], this method is able to provide direct and accurate predictions regarding the type of content distributed by these sources,” said Sibel Adali, a professor of computer science at Rensselaer Polytechnic Institute who was not involved in the project.
Nakov is quick to caution that the system is still a work-in-progress, and that, even with improvements in accuracy, it would work best in conjunction with traditional fact-checkers. “If outlets report differently on a particular topic, a site like PolitiFact could instantly look at our ‘fake news’ scores for those outlets to determine how much validity to give to different perspectives,” said Nakov, a senior scientist at QCRI.
Baly and Nakov co-wrote the new paper with MIT senior research scientist James Glass, alongside master’s students Dimitar Alexandrov and Georgi Karadzhov of Sofia University. The team will present the work later this month at the 2018 Empirical Methods in Natural Language Processing (EMNLP) conference in Brussels, Belgium.
The researchers also created a new open-source dataset of over 1,000 news sources, annotated with factuality and bias scores— the world’s largest database of its kind. As next steps, the team will be exploring whether the English-trained system can be adapted to other languages, as well as go beyond the traditional left-right bias, to explore region-specific biases (like the Muslim World’s division between religious and secular).
“This direction of research can shed light on what untrustworthy websites look like, and the kind of content they tend to share, which would be very useful for both web designers and the wider public,” explained Andreas Vlachos, a senior lecturer at Cambridge University, who was not involved in the project.
Nakov added that QCRI also has plans to roll out an app which helps users step out of their political bubbles, in order to respond to specific news items, by offering users a collection of articles which span the political spectrum.
“It’s interesting to think about new ways to present the news to people,” Nakov said. “Tools like this could help people give a bit more thought to issues, and explore other perspectives that they might not have otherwise considered.”