Subscribe now

Letter: Fake news detection can be made simpler than that

Published 3 July 2019

From Peter Bleackley, Horsham, West Sussex, UK

I read with interest Donna Lu's article on Rowan Zellers and his colleagues using a machine learning system that can generate fake news in order to detect it (15 June, p 15). Its reported accuracy is impressive, but I note that their Grover model has been tested only on a sample of fake news articles that it had itself generated.

I have recently done some work using public data sets and have achieved similar levels of accuracy with a much simpler model (available at bit.ly/NS-fakes). I trained it on a sample of 13,000 fake news reports published on the Kaggle data science website, and on the Reuters-21578 ApteMod corpus, which is a sample of 10,788 articles from the press agency's trusted newswire. To avoid bias and to future-proof the system, I didn't train the model on the articles' content.

Instead, it looks at sentence structures and function words (pronouns, prepositions, conjunctions and auxiliaries). A simple logistic regression machine learning model classified the documents, using 70 per cent for training and 30 per cent for testing. It was able to distinguish between real and fake news with 93 per cent accuracy.

Issue no. 3237 published 6 July 2019

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New Scientist events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop