Résumé:
The spread of fake news on the internet is a major challenge. Traditional fact-checking
methods, which rely on human experts to verify information credibility, are not scalable
to the volume of online content.
Automated fake news detection is a more scalable approach that uses artificial intelligence
to learn and identify patterns from news data that are more likely to be fake.
However, despite its advantages in terms of accuracy and response time, supervised learning
solutions in automated fake news detection have failed to get ahead in identifying
fake content across cultures and languages due to the limited availability of fake-checked
datasets and biases in the data.
To address fake news detection biases, this study proposes a comprehensive approach to
fake news detection that combines supervised and unsupervised learning strategies. The
first strategy involves supervised automatic fact-checking using a Transformer model and
other models including a Logistic classifier, SVM classifier, Convolutional Neural Network
classifier, Naive Bayesian classifier, and XGboost classifier. These models are trained on
labeled datasets to identify fake news with high accuracy. The second strategy involves
unsupervised learning, which is used to handle unlabeled datasets. This strategy allows
us to effectively explore and benefit from a wide range of information and insightful facts
to our supervised classifiers.
We evaluate our approach on two datasets, the LIAR dataset and the ISOT Fake
News Dataset. We achieve an accuracy of up to 91% on the ISOT Fake News Dataset
through unsupervised relabeling. We also achieve a 100% F1-score in a supervised learning
experiment, which means with the real labels of the ISOT dataset.
Our study contributes to the development of effective strategies for combating fake
news, addressing the challenges posed by the growing digital area nowadays.
Keywords: Fake news, Misinformation, Fact-checking, Supervised learning, Unsupervised learning, Transformers.