Abstract
Popularity bias, as a long-standing problem in recommender systems (RSs), has been fully considered and explored for offline recommendation systems in most existing relevant researches, but very few studies have paid attention to eliminate such bias in online interactive recommendation scenarios. Bias amplification will become increasingly serious over time due to the existence of feedback loop between the user and the interactive system. However, existing methods have only investigated the causal relations among different factors statically without considering temporal dependencies inherent in the online interactive recommendation system, making them difficult to be adapted to online settings. To address these problems, we propose a novel counterfactual interactive policy learning (CIPL) method to eliminate popularity bias for online recommendation. It first scrutinizes the causal relations in the interactive recommender models and formulates a novel temporal causal graph (TCG) to guide the training and counterfactual inference of the causal interactive recommendation system. Concretely, TCG is used to estimate the causal relations of item popularity on prediction score when the user interacts with the system at each time during model training. Besides, it is also used to remove the negative effect of popularity bias in the test stage. To train the causal interactive recommendation system, we formulated our CIPL by the actor–critic framework with an online interactive environment simulator. We conduct extensive experiments on three public benchmarks and the experimental results demonstrate that our proposed method can achieve the new state-of-the-art performance.
Framework
Experiment
Conclusion