Real-time Fake News Detection and Claim Verification using Meta Curriculum Learning — Part II
By Santanu Pal, Shivam Sharma, Addepelli Sai Srinivas, Sangram Jethy, Nabarun Barua, Vinutha B N
Social media for news consumption is a double-edged sword that enables the wide spread of “fake news’’ — misinformation and disinformation. The extensive spread of fake news in social media distorts individuals’ points of view regarding societally sensitive topics, such as politics, health, gender, or religion. Consequently, it results in negative impact on individuals and society. Fake news spreads easily in online social networks propagated by social media actors and network communities to achieve specific (mostly malevolent) objectives. Fake news campaigns are increasingly powered by advanced AI techniques consequently increasing the effort for the detection of fake content.
Our contributions on fake claim verification Research
Most fact checking organizations use human validation of information, the ever-increasing amount of new information on the Internet makes manual verification challenging, time-consuming, and costly. Our research, to automatically verify fake claims, explores how the world-knowledge source is encoded and exploited by the models, as well as ways to enhance the quality of this knowledge by leveraging external resources.
The authenticity of highly propagated news or fact verification is demanded in many domains. Our fake news detection and verification technique learns domain-specific and cross-domain information of news records without knowing the actual domain i.e., a domain-agnostic Fake News Detection model.
In fake news claim verification tasks, the evidence plays a crucial role in determining whether the provided claim is fake or not. Therefore, the retrieved evidence for any given claim should be relevant and contain necessary information about the queried claim.
We propose an approach that improves over the current automatic fake news detection approaches by automatically gathering evidence for each claim. Our approach retrieves top relevant web articles and then selects appropriate text to be treated as evidence sets. It is observed that even if the relevant articles are found, not the entire information in the article is useful or needed for down-stream verification or detection tasks. To begin the process of extracting evidence, we first perform a web search of a given claim and gather all the top URLs that appear in the search. Second, only those URLs with high confidence scores are shortlisted using a similarity check between the claim and the content in the URLs. A predefined confidence threshold score confirms that only relevant URLs are considered. Next, another similarity check is conducted on each of the shortlisted URL’s content which provides valuable information for the input claim from each paragraph. Here text with highest similarities are considered for evidence of the input claim. More details are available in our previous post here.
Figure 1: Evidence Collection
MACRO: A Domain Agnostic Fake News Verification Model
As part of performing meta-learning, it is crucial to curate a support set, corresponding to a query claim/input, even during the inference stage. This requires a prior understanding of the data distribution corresponding to the target domain/task. Due to the ambiguity involved in characterizing a real-time claim, towards its verification, we first assess its domain. This is performed using a pre-trained zero-shot text classification model which is based on a Bart-Large model. Essentially, we begin by considering health, politics, violence and government as a preliminary set of domains, for which a support set of 50 samples, with 25 samples each for fake and not-fake categories, are curated. Next, for each of the query claims, we first perform its zero-shot domain classification, amongst the domain categories. Based upon the ordering of the prediction confidence, a user-intervention is solicited to select one of the top three options prompted. Based upon the selection, the corresponding support-set of 50 samples is invoked for performing fine-tuning based adaptation, inducing the corresponding inductive biases. On the other hand, if none of the predicted domains provided by the domain classifier (a Bart-Large based pre-trained zero-shot text classification model) are relevant according to the users’ choice, an alternative approach is adopted for support-set curation via curriculum learning and semantic similarity based sampling.
In this work, we leverage the MAML framework (Finn et al., 2017) towards addressing the broad objective of fake claim verification, by inducing domain agnostic meta-learning (see Figure 2). Therefore, we consider the domain of the claim text synonymous to different tasks, required towards performing meta-learning. Towards task curation, we prepare sets of combination, support and query sets with pre-specified configurations, corresponding to different domains present in IFND dataset. We then use the tasks curated, towards training and evaluating our fake claim verification model. The base model (described below) of our meta-learning framework essentially encodes both input claims and supporting evidence independently, and fuses their pooled representations towards training a fake vs. not-fake classifier.
Fake Claim Verification Modeling (The Base Model).
We further design a neural formulation for claim verification tasks. To this end, we use two BERT-based encoders, for encoding input claims and supporting evidence, respectively. Effectively for a given claim (ci) and a corresponding evidence (ei), we fine-tune two BERT based encoders, which facilitate processing of linguistic features embedded within ci and ei , and obtain their pooled output representations
, respectively. We then concatenate these two outputs to obtain a fused representation
, which is then projected into a lower dimension to obtained a condensed representation
, using a dense layer via ReLU based non-linear activation.
Figure 2: Domain-agnostic Fake News Verification Model
The overall binary classification based objective of performing fake claim verification, into either fake or not-fake categories is performed eventually, mapping the condensed representation
, obtained in the previous step to an output layer of size 1. We then apply sigmoid based non-linearity on top of this to obtain probabilistic output capturing binary objectives. We finally train the model using back-propagated binary cross-entropy loss formulation.
Towards addressing the potential domain gap between the requirement from a new claim-from-the-wild and the existing a priori support sets, we also examine the coherence facilitated by the following two approaches:
Curriculum learning. Curriculum learning aims to facilitate learning from similar curricula in order to avoid bad local optimum at early stages, and gradually descend towards learning from domain-specific curricula, facilitating learning of individualities specific to a particular domain. We leverage this approach towards deriving a finite support set for a new claim with unknown domain information. The ranking of the support samples is done using a claim-level divergence score
is the claim sentence level divergence score and
are token-averaged cross-entropy scores generated by BERT based transformer LM, fine-tuned on domain-specific claims and generic corpora, respectively.
Semantic similarity. Another approach that we examine towards curating a support set, for a claim belonging to an unknown domain, is the semantic similarity. Semantic similarity is a well-known approach that is used to derive the percentage of likelihood of similarity between two sentence blocks. We employ this approach to collect top k claims semantically similar as the query claim. We use roBERTa based transformer model for computing sentence similarity metric.
Why two encoders? Table 2 shows the efficacy of BERT-based single-encoder based detection and 2-encoder based verification models, over others, when employed within the framework of meta-learning. We conducted experiments on different pre-trained based models in our meta-learning setup such as IndicBERT, BERT, RoBERTA, etc, among them BERT provides consistent and better results compared to others.
Table 2: Comparison of the various encoder and modeling configurations examined towards fake claim detection and verification; results are reported based on SAI (a manually prepared realistic dataset) datasets.
Table 3 shows our model performance on various benchmarking and realistic (SAI) settings.
Table 3: Comparison of performances of Meta learning-based approach on benchmarking datasets (Health, Fever) and SAI (a manually prepared realistic dataset) datasets.
Meta learning vs Fine Tuning: Table 4 shows meta-learning provides better than fine-tuning in realistic settings and produces consistent results.
Table 4: Benchmarking result comparison between Meta-learning and Fine-tuning approaches, on Health, Fever, and SAI (a manually prepared realistic dataset) datasets.
Generalizability: Table 5 shows decent in-domain (ID), and an impressive out-of-domain (OOD) generalizability of MACRO reinstates its efficacy towards handling samples from unknown domains, and within a few-shot adaptation setting. This helps in overall less carbon footprint and requires less computation to achieve the same result.
Table 5: Generalizability assessment of various approaches, based upon the performance on in-domain and out-of-domain scenarios
● We show how the world knowledge is used by the subsequent neural networks for encoding, and can be exploited by the models for performing specific tasks, as well as ways to enhance the quality of this knowledge by leveraging external resources.
● We show how our fake news detection technique learns domain-specific and cross-domain information of news records without knowing the actual domain i.e., a domain-agnostic Fake News Detection model using meta curriculum based learning framework, which is subsequently used to identify fake news records.
● Our novel base model architecture for meta-curriculum learning framework is designed as two encoders- (i) one encoder to encode claims, and (ii) another encoder for the corresponding set of evidence. To the best of our knowledge this is the first attempt to use two different transformer encoders in the fake claim verification task.
● We empirically find that our meta-learning based approach provides consistent results compared to the fine-tuning based approaches.
Finn, C., Abbeel, P. & Levine, S.. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1126–1135 Available from https://proceedings.mlr.press/v70/finn17a.html.
Mrinal Rawat and Diptesh Kanojia, Automated Evidence Collection for Fake News Detection, CoRR,abs/2112.06507, 2021, https://arxiv.org/abs/2112.06507