TNNLS 2023
CIPL: Counterfactual Interactive Policy Learning to Eliminate Popularity Bias for Online Recommendation
Yongsen Zheng, Jinghui Qin, Pengxu Wei, Ziliang Chen, Liang Lin
TNNLS 2023

Abstract


Popularity bias, as a long-standing problem in recommender systems (RSs), has been fully considered and explored for offline recommendation systems in most existing relevant researches, but very few studies have paid attention to eliminate such bias in online interactive recommendation scenarios. Bias amplification will become increasingly serious over time due to the existence of feedback loop between the user and the interactive system. However, existing methods have only investigated the causal relations among different factors statically without considering temporal dependencies inherent in the online interactive recommendation system, making them difficult to be adapted to online settings. To address these problems, we propose a novel counterfactual interactive policy learning (CIPL) method to eliminate popularity bias for online recommendation. It first scrutinizes the causal relations in the interactive recommender models and formulates a novel temporal causal graph (TCG) to guide the training and counterfactual inference of the causal interactive recommendation system. Concretely, TCG is used to estimate the causal relations of item popularity on prediction score when the user interacts with the system at each time during model training. Besides, it is also used to remove the negative effect of popularity bias in the test stage. To train the causal interactive recommendation system, we formulated our CIPL by the actor–critic framework with an online interactive environment simulator. We conduct extensive experiments on three public benchmarks and the experimental results demonstrate that our proposed method can achieve the new state-of-the-art performance.

 

 

Framework


 

 

 

Experiment


 

 

Conclusion


Most existing methods aim to alleviate popularity bias for offline recommendation, and few efforts have been made to explore such bias in online interactive recommendation. However, bias amplification will become increasingly serious over time as the user interacts with the recommended items in interactive recommendation. Those methods blindly consider the static connections as the relationships among nodes on graph by directly omitting the temporal dependencies. To address these problems, we propose a novel framework CIPL to mitigate the popularity bias for online recommendation. It first formulates the novel TCG, which is then adopted to estimate the cause-effect of item popularity on prediction score during causal interactive recommender training and leverage counterfactual inference to remove the bad influence of popularity bias in the phase of model inference. Extensive experiments on three publicly available benchmarks validate the effectiveness of the proposed method to eliminate popularity bias for online recommendation and its superiority in recommendation tasks.